11) Recalling a Gradient Boosting Machine.

This Blog entry is from the Probability and Trees section in Learn R.

Recalling the GBM is quite initiative and obeys the standardised predict signature.  To recall the GBM:

GBMPredictions <- predict(GBM,CreditRisk,type = "response")
1.png

Run the line of script to console:

2.png

A distinct peculiarity, given that the CreditRisk data frame has a dependent variable which is a factor, is that the binary classification has been modelled between 1 and 2, being the levels of the factor with 1 being Bad, and Good being two:

3.png

 It follows that predictions that are closer to 2, than 1 would be considered to be Good, whereas vice versa, 1.  To appraise the model performance, a confusion matrix should be created.  Create a vector using the ifelse() function to classify between Good and Bad:

CreditRiskGBMClassifications <- ifelse(GBMPredictions >= 1.5,"Good","Bad")
4.png

Run the line of script to console:

5.png

Create a confusion matrix between the actual value and the value predicted by the GBM:

CrossTable(CreditRisk$Dependent, CreditRiskGBMClassifications)
6.png

Run the line of script to console:

7.png

It can be seen in this example that the GBM has mustered a strong performance.  Of 220 accounts that were bad, it can be seen that the GBM classified 182 of them correctly, which gives an overall accuracy rating of 82%. This is a more realistic figure when compared to C5 boosting, as over-fitting will have been contended with.