9) Grading the ROC Performance with AUC.

This Blog entry is from the Logistic Regression section in Learn R.

Visually the ROC Curve plot created in the previous Blog entry suggests a that the model created has some predictive power.  A more succinct method to measure model performance is the Area Under Curve statistics which can be calculated with ease by requesting "auc" as the measure to the performance object:

AUC <- performance(ROCRPredictions,measure = "auc")

Run the line of script to console:

To write out the contents of the AUC object:

AUC

Run the line of script to console:

The value to gravitate towards is the y.values,  which will have a value ranging between 0.5 and 1:

In this example, the AUC value is 0.827767 which suggests that the model has an excellent utility. By way of grading, AUC scores would correspond:

·         A: Outstanding > 0.9

·         B: Excellent > 0.8 and <= 0.9

·         C: Acceptable > 0.7 and <= 0.8

·         D: Poor > 0.6 and <= 0.7

·         E: Junk > 0.5 and <= 0.6

2) Create an Abstraction Deviation Independent Vector.

This Blog entry is from the Logistic Regression section in Learn R.

In behavioural analytics, especially, one of the most powerful improvements that can be made to a variable is a transformation to compare the value for that records against the value typically observed in this vector for a customer \ product \ portfolio.  There are of course several normalisations that are appropriate for such a task, such as a Z score, however in this instance given the data being skewed a range normalisation may be more appropriate.

A range normalisation will establish the largest value observed in the vector, the smallest value and establish where a test value exists on that range in percentage terms.  In this example, a range normalisation will be performed on the columns Count_Transactions_1_Day.  Firstly, establish the maximum and minimum values:

Min_Count_Transactions_1_Day <- min(FraudRisk\$Count_Transactions_1_Day)
Max_Count_Transactions_1_Day <- max(FraudRisk\$Count_Transactions_1_Day)

Run the block of script to console:

At this stage, the minimum and maximum values have been stored as vectors for Count_Transactions_1_Day.  To create a new vector as a range normalisation:

Range_Deviation_Count_Transactions_1_Day <- (FraudRisk\$Count_Transactions_1_Day - Min_Count_Transactions_1_Day) / (Max_Count_Transactions_1_Day - Min_Count_Transactions_1_Day)

Run the line of script to console:

Append the newly created vector to the FraudRisk data frame:

FraudRisk <- mutate(FraudRisk, Range_Deviation_Count_Transactions_1_Day)

Run the line of script to console:

It can be seen that a plot has been created between the variable Count_Unsafe_Terminals_1_Day and the Dependent variable, and on the basis, that the fraud can either be or not, it has plotted nothing between the points on the Y axis: