5) Training a Classification Model.

This Blog entry is from the Neural Networks section in Learn R.

Neural Networks are universal classifiers, which means to say that they can be used as well on numeric prediction as classification.  It won’t have escaped notice however that the internal weights comprising the neural network are all numeric coefficients.  It follows that all input and output variables should be numeric also (via categorical data pivoting to 1 / 0, unfortunately not being able to rely on neuralnet() to interpret factors).  In this example, a dataset of transactions where half of the transactions are fraud and half genuine, will be used as in Logistic Regression.  Start by importing the FraudRisk dataset:

FraudRisk <- read_csv("D:/Users/Trainer/Desktop/Bundle/Data/FraudRisk/FraudRisk.csv")
1.png

Run the line of script to console:

2.png

Once the FraudRisk data frame has been created, create a neural network of ten independent variables known to have strong correlation to the dependent variable with one hidden layer of four processing elements:

FraudRiskNeuralNetwork <- neuralnet(Dependent ~ Count_Unsafe_Terminals_1_Day + High_Risk_Country + Foreign + Authenticated + Has_Been_Abroad + Transaction_Amt + Different_Country_Transactions_1_Week + Different_Decline_Reasons_1_Day + Count_Transactions_Declined_1_Day + Count_In_Person_1_Day,data = FraudRisk, hidden = 4)
3.png

Run the line of script to console, it may take some time:

4.png

Once the console returns, the Neural Network has been trained upon the FraudRisk Dataset.  For the purposes of this procedure it can be taken for granted that plot would return.

6) Creating a Confusion Matrix for a C5 Decision Tree.

This Blog entry is from the Probability and Trees section in Learn R.

Beyond the summary statistic created, the confusion matrix is the most convenient means to appraise the utility of a classification model. The confusion matrix for the C5 decision tree model will be created using the CorssTable function of the gmodels() package:

library("gmodels")
CrossTable(CreditRisk$Dependent, CreditRiskPrediction)
1.png

Run the line of script to console:

2.png

The overall utility of the C5 decision tree model can be inferred in the same manner as procedure 100.

The confusion matrix classified 206 records as being bad correctly, taking CreditRiskPrediction column wise, it can be seen that 28 records were classified as Bad yet they were in fact Good.  It can be said that there is an 11.9% error rate on records classified as bad by the model.  Taking note of this metric, in procedure 112 boosting will be attempted which should bring about improvement of this model.