6) Creating a One Way Linear Regression Model.

This Blog entry is from the Linear Regression section in Learn R.

Beforehand the lm() function was used inside the stat_smooth() function of ggplo2 to create a linear regression solution,  rather line of best fit. Naturally the lm() function can also be used to create linear regression model which can be deployed as a predictive model in its own right. 

To create a linear regression model with one dependent variable and one independent variable:

LinearRegression <- lm(Dependent ~ Median_4,FDX)
1.png

Run the line of script to console:

2.png

Once the model has been computed it can be output:

LinearRegression
3.png

Run the line of script to console:

4.png

The summary() function can be used to expand on the validity and performance of the model:

summary(LinearRegression)

5.png

Run the line of script to console:

6.png

A more traditional Linear Regression model has now been written out.   It is worth checking the precision of the coefficients to ensure that they have not been truncated, as this can lead to a profound change in the predicted values:

coeefeicents(LinearRegression)
7.png

Run the line of script to console:

8.png

It can be seen that the coefficients written out have rather more decimal places, or precision, which will be extremely important when seeking to make accurate predictions.

5) Adding a Trend Line to a Scatter Plot.

This Blog entry is from the Linear Regression section in Learn R.

In a subsequent procedure a scatter plot comparing the dependent variable and the independent variable was created of Median_4.  In the scatter plot, there was, just about, a relationship identified.  To better visualise this relationship a trend line can be added based on a line of best fit through the points on the scatter plot. 

Firstly, revisit previous Blog entries in this section to create the scatter plot using ggplot2 and the qplot() function:

1.png

Run the line of script to console:

2.png

The actual formula for linear regression, as created by the lm() function is to be explained in more depth in subsequent procedures,  however for the moment the lm() function is going to specified as the method of the stat_smooth() method of ggplot2:

qplot(FDX$Median_4,FDX$Dependent) + stat_smooth(method="lm")
3.png

Run the line of script to console:

4.png

It can be seen that a plot has been created as before, yet this time with a trend line representing a linear regression model:

5.png

It can be seen that there is a very shallow downward trend and this linear regression solution has some predictive power, albeit very weak in isolation (hence the importance of multiple linear regression, and to be explained).