10) Create the Skewness and Kurtosis statistics.

This Blog entry is from the Summary Statistics Plots section in Learn R.

It can be observed from previous procedures that the histogram has a severe lean towards the axis, which would be described as being positively skewed.   The positive skew deviating from the shape expected of a normal distribution would be cause mistrust of the standard deviation that was created.  Two useful statistics and functions for assessing the extent to which a distribution deviates from the normal distribution is skewness() measuring the lean towards and away from the y axis and kurtosis() measuring how tall or squashed the distribution is.

The functions skewness() and kurtosis() do not exist in the base R packages rather they are available in a package called moments.  It follows that the moments package need be installed then loaded.  Search for and install the package moments via RStudio:

1.png

Click the Install button to run the installation instruction to console:

2.png

Load the library moments by typing into the script window:

library(moments)
3.png

Run the line of script to console:

4.png

Firstly, in the quest to appraise the extent to which the vector leans towards or away from the axis, type:

skewness(AAPL$Interim_Close)
5.png

Run the line of script to console:

6.png

It can be observed that there is a positive value returned, indicating that there is indeed lean and owing to it being positive, that the lean is towards the y axis (which is of course what was visually observed in subsequent Blog entries).  Secondly to understand if the distribution is tall or squat, verify the kurtosis by typing:

kurtosis(AAPL$Interim_Close)
7.png

Run the line of script to console:

8.png

The kurtosis is a difficult statistic to make sense of and in many respects the skewness is a more useful statistic.  To make an assessment of the shape of the distribution, typically, all summary statistics need to be considered:

9.png

9) Create a Range Normalisation for a Value.

This Blog entry is from the Summary Statistics Plots section in Learn R.

A useful normalisation is to appraise a value against a scale from the smallest to the largest value.  The formula for range normalisation, taking the value 201 to be test, is (201 – min) / (max – min) where the minimum and maximum values as calculated in previous Blog entries.  To test where the value 201 exists on a scale between the minimum and maximum value:

1.png

Run the line of script to console:

2.png

The output shows that the test value of 201 exists at a point of 23% between the minimum and maximum value observed in the vector.

8) Calculate a Z Score.

This Blog entry is from the Summary Statistics Plots section in Learn R.

In the previous procedure a calculation was performed representing one standard deviation.  A Z Score takes a value then expresses how many standard deviations that value is from the mean.  For the purposes of this example, the value to appraise is 201.  The formula to calculate how many standard deviations from the mean the value 201 is (201 – Mean) / Standard Deviation.

To identify the Z score of the value 201 type:

(201 - mean(AAPL$Interim_Close)) / sd(AAPL$Interim_Close)
1.png

Run the line of script to console:

2.png

In this example, it can be seen that the value 201 is quite close to the mean being a mere 0.28 standard deviations away from the average.  However, as presented in procedures preceding the calculation of the Z score, there are some issue in the way the data is distributed casting some doubt on the relevance of the standard deviation.

7) Create the Variance and Standard Deviation.

This Blog entry is from the Summary Statistics Plots section in Learn R.

The Blog entries presented beforehand thus far ignores the existence of a summary() function that produces an analysis of a vector and returns the same summary statistics.  To return the summary statistics in this manner type:

summary(AAPL$Interim_Close)
1.png

Run the line of script to console:

2.png

It can be seen that many of the summary statistics produced one by one are written out to a vector as the result of the summary() function.  There is a conspicuous absence of the Variance and Standard Deviation measures in the summary function which calls for the use of the sd() and var() functions.  To review the variance of a vector type:

var(AAPL$Interim_Close)
3.png

Run the line of script to console:

4.png

The variance calculation takes the difference between each value and the overall mean, squares it, then takes an average of that.  In this case the variance is 3182.2, it could be said that the larger the value the more it varies.  The standard deviation, a more useful statistic is simply the square root of the variance.  It is more practical to go straight to the Standard Deviation by typing:

sd(AAPL$Interim_Close)
5.png

Run the line of script to console:

7.png

The standard deviation in this example is 177.1502, a value which has special meaning as adding this to the mean of 251.8668 as produced in procedure 58, it can be said (in a normal distribution at least) that 68.2% of all values will live in the range between 0 (as we can’t go below zero) and 429.017.  The fact that the lower band is below 0 leads to inference that the distribution is not normally shaped, which is known already from procedure 55, where the vector was plotted to a histogram and box plot.

To create an upper band, this being a single Standard Deviation from the Mean:

mean(AAPL$Interim_Close) + sd(AAPL$Interim_Close)
8.png

Run the line of script to console:

9.png

6) Navigate Plots and Export Visualisations.

This Blog entry is from the Summary Statistics Plots section in Learn R.

Upon the creation of a box plot at first glance it may appear as if the Histogram created as one of the first Blog entries in this section has been overwritten.  Upon closer inspection, it can be seen that this is not the case as there is a back arrow, function, that allows for the paging through plots created:

1.png

Clicking on the back arrow will return to the Histogram created beforehand:

2.png

Conversely the forward arrow returns to the newly created Box Plot.  RStudio provide a number of mechanisms to export the visualisation via the Export button, clicking on it presents the options:

3.png

In the drop-down there are several options to export an image from a plot, although the most versatile is to copy the visualisation to clipboard as an image for pasting into a plethora of third party applications, such as Word, via the established Copy \ Paste mechanism familiar to Windows users. 

To copy the image, click on the sub menu item Copy to Clipboard which will open a dialog box setting out the specification of the image:

4.png

Options for the creation of the image include the dimensions of the image and the precise format \ encoding, in this case the defaults are adequate as a bitmap is a suitably versatile format.  Click the Copy Plot button to copy the image to the clipboard.  The image can now be pasted into any application that can make use of a bitmap, such as Powerpoint, Word, Excel of Paint:

5.png