2) Create an Abstraction Deviation Independent Vector.

This Blog entry is from the Logistic Regression section in Learn R.

In behavioural analytics, especially, one of the most powerful improvements that can be made to a variable is a transformation to compare the value for that records against the value typically observed in this vector for a customer \ product \ portfolio.  There are of course several normalisations that are appropriate for such a task, such as a Z score, however in this instance given the data being skewed a range normalisation may be more appropriate.

A range normalisation will establish the largest value observed in the vector, the smallest value and establish where a test value exists on that range in percentage terms.  In this example, a range normalisation will be performed on the columns Count_Transactions_1_Day.  Firstly, establish the maximum and minimum values:

Min_Count_Transactions_1_Day <- min(FraudRisk$Count_Transactions_1_Day)
Max_Count_Transactions_1_Day <- max(FraudRisk$Count_Transactions_1_Day)
1.png

Run the block of script to console:

2.png

At this stage, the minimum and maximum values have been stored as vectors for Count_Transactions_1_Day.  To create a new vector as a range normalisation:

Range_Deviation_Count_Transactions_1_Day <- (FraudRisk$Count_Transactions_1_Day - Min_Count_Transactions_1_Day) / (Max_Count_Transactions_1_Day - Min_Count_Transactions_1_Day)
3.png

Run the line of script to console:

4.png

Append the newly created vector to the FraudRisk data frame:

FraudRisk <- mutate(FraudRisk, Range_Deviation_Count_Transactions_1_Day)
5.png

Run the line of script to console:

6.png

It can be seen that a plot has been created between the variable Count_Unsafe_Terminals_1_Day and the Dependent variable, and on the basis, that the fraud can either be or not, it has plotted nothing between the points on the Y axis: