1) Scanning Scatter Plots for Relationships.

This Blog entry is from the Linear Regression section in Learn R.

R has a function called pairs() which is incredibly useful for visualizing the relationships existing between variables inside a data frame on a fairly exhaustive basis.  It is possible to simply pass the data frame as an argument to the pairs function for an exhaustive visualization to be produced:

pairs(FDX)
1.png

Run the line of script to console:

2.png

In this example, the data frame is far too large, having hundreds of columns, which would create a visualization that is many times larger than the RStudio plots pane.  It follows that more selectivity in the vectors to be used in the visualization need be mustered, a simple matter of subscripting the data frame using square brackets as an argument to the Pairs function:

pairs[c("Dependent"," Median_1"," Median_1_PearsonCorrelation"," Median_1_ZScore "," Mode_1"," Mode_1_PearsonCorrelation","Mode_1_ZScore")]
3.png

Run the line of script to console to produce a matrix of scatter plots:

4.png

In this example, the relationship between the dependent variable and the independent variables is most interesting, at a moment's glance it can be seen that several extreme relationships exist.

This process would be repeated, including the dependent variable, for several other groups of independent variables until such time as a familiarity of relationships has been amassed and a good feel for how independent variables relate to the dependent variable has been obtained.  This process can help identify independent variables that correlate well with the dependent variable, carrying these variables forward for the purposes of modeling.

Introduction to Linear Regression.

This Blog entry is from the Linear Regression section in Learn R.

Linear Regression is a modelling technique that can be used for numeric prediction where the values are fairly normal in distribution. 

The dataset that is used in this section of Blog entries is available under Bundle\Data\Equity\Abstracted\FDX\PC_FDX_Close_200x1D_Close_50x1D_10.csv which contains data that has already been abstracted for the FedEx stock on the NYSE.

To proceed with the subsequent procedures, it is necessary to import the file PC_FDX_Close_200x1D_Close_50x1D_10.csv into R:

1.png

For completeness the library(readr) and Load_CSV() function text will be copied to the current script to ensure that the script remains portable:

2.png

For ease and simplicity the name of the data set has been changed to FDX from the default of PC_FDX_Close_200x1D_Close_50x1D_10.csv:

3.png

Executing the load, the contents of the csv file will automatically be exposed on invoking the view() function in the console:

4.png