4) Forward Stepwise Logistic Regression.

This Blog entry is from the Logistic Regression section in Learn R.

As previous Blog entries allude, whereas the linear regression function in R was lm(), the logistic regression function is glm(), with supplementary parameters specifying the family as being a binomial distribution (which is a stalwart distribution for classification problems).  The syntax is very similar to create a logistic regression model, albeit including the family argument to detail the type of curve to fit:

LogisticRegressionModel <- glm(Dependent ~ Count_Unsafe_Terminals_1_Day,data=FraudRisk,family="binomial")
1.png

Run the line of script to console:

2.png

As with a lm() type model, the summary() function can return the model output:

summary(LogisticRegressionModel)
3.png

Run the line of script to console:

4.png

As with models created using the lm() function, the summary is somewhat inadequate to get the coefficients with correct precision, notwithstanding that the predict.glm() function will be used for recall:

coefficients(LogisticRegressionModel)
5.png

Run the line of script to console to output the coefficients for a manual deployment of the logistic regression model:

6.png

This Blog entry would naturally lead into a stepwise multiple logistic regression model, and in this example a factor as created in preceding Blog entries will be added with the assumption that it is the next strongest correlating factor:

7.png

Run the line of script to console:

8.png

Write out the coefficients to observe the treatment of each different state inside the factor TypeFactor:

9.png

10) Create a Stepwise Linear Regression Model

This Blog entry is from the Linear Regression section in Learn R.

A stepwise Linear Regression model refers to adding independent variables in the order of their correlation strength in an effort to improve the overall predictive power of the model.  Referring to the output of the correlation analysis:

1.png

It can be seen that the next strongest independent variable, when taking a Pearson correlation is Skew_3 followed by Range_2_Pearson_Correlation.  The process of forward stepwise linear regression would be adding these variables to the model one by one, seeking improvement in the multiple r while retaining good P values.  To create a multiple linear regression model of the strongest correlating independent variables:

MultipleLinearRegression <- lm(Dependent ~ Skew_3 + Range_2_PearsonCorrelation)
2.png

Run the line of script to console:

3.png

Write the summary out to observe the multiple R:

summary(MultipleLinearRegression)

4.png

Run the line of script to console:

5.png

Several statistics are of interest in the multiple linear regression.  The first is the p values relating to the overall model and the independent variables, each of these references scientific notation and so we can infer that it is an extremely small number far below the 0.05 cut off that is arbitrarily used.  Secondarily, the multiple R statistic is of interest, which will be the target of improvement in subsequent iterations.

The next step is to add the next strongest correlating independent variable, which is PointStep_5_PearsonCorrelation:

MultipleLinearRegression <- lm(Dependent ~ Skew_3 + Range_2_PearsonCorrelation + PointStep_5_PearsonCorrelation)

6.png

Run the line of script to console:

7.png

In this example, it can be seen that the R squared has increased, so it can be inferred that the model has improved, while the p values are still extremely small. A more relevant value to pay attention to would be the adjusted R, which takes into account the number of independent variables and writes the multiple r accordingly, as such it is prudent to pay close attention to this value.

Repeat the procedure until such time as the improvement in multiple r plateaus or the performance of the P values decreases.