2) Create an Abstraction Deviation Independent Vector.

This Blog entry is from the Logistic Regression section in Learn R.

In behavioural analytics, especially, one of the most powerful improvements that can be made to a variable is a transformation to compare the value for that records against the value typically observed in this vector for a customer \ product \ portfolio.  There are of course several normalisations that are appropriate for such a task, such as a Z score, however in this instance given the data being skewed a range normalisation may be more appropriate.

A range normalisation will establish the largest value observed in the vector, the smallest value and establish where a test value exists on that range in percentage terms.  In this example, a range normalisation will be performed on the columns Count_Transactions_1_Day.  Firstly, establish the maximum and minimum values:

Min_Count_Transactions_1_Day <- min(FraudRisk$Count_Transactions_1_Day)
Max_Count_Transactions_1_Day <- max(FraudRisk$Count_Transactions_1_Day)

Run the block of script to console:


At this stage, the minimum and maximum values have been stored as vectors for Count_Transactions_1_Day.  To create a new vector as a range normalisation:

Range_Deviation_Count_Transactions_1_Day <- (FraudRisk$Count_Transactions_1_Day - Min_Count_Transactions_1_Day) / (Max_Count_Transactions_1_Day - Min_Count_Transactions_1_Day)

Run the line of script to console:


Append the newly created vector to the FraudRisk data frame:

FraudRisk <- mutate(FraudRisk, Range_Deviation_Count_Transactions_1_Day)

Run the line of script to console:


It can be seen that a plot has been created between the variable Count_Unsafe_Terminals_1_Day and the Dependent variable, and on the basis, that the fraud can either be or not, it has plotted nothing between the points on the Y axis: