1) Pivot a Categorical Variable for Regression Analysis

This Blog entry is from the Logistic Regression section in Learn R.

In behavioral analytics and classification, character data and numeric label data (that which has a numeric label, but obeys no standard distribution) appear quite often.  It is necessary to pre-process such label data, pivoting the distinct values to their own columns, representing either a 1 or a 0, for example the transaction in this instance was either made on a Chip card (i.e. 1) or it was not (i.e. 0)  

For dealing with categorical variables, and as a labor-saving tactic to avoid having to perform categorical data pivoting on each and every distinct entry in a vector, the factor functionality can be invoked. It is most ideal if these categorical data pivots are done during data preparation, in an SQL procedure or the Jube platform.

It can be seen that the data was imported with the type field taking the form of a character field:

1.png

Start by creating a factor which will implicitly convert the contents of the Type column to the factor:

2.png

Run the line of script to console:

3.png

It can be seen that the factor has been created and appears in the environment pane:

4.png

All that remain is to append the newly created to factor to the FraudRisk data frame to that it can be used in previous Blog entries:

libarary(dplyr)
FraudRisk <- mutate(FraudRisk,TypeFactor)
5.png

Run the block of script to console:

6.png

While R has a convenient data structure in the form of factors, it may well be appropriate to manually pivot data to a vector based on rudimentary if logic and \ or as part of horizontal abstraction.   In this example, a vectorised comparison will be performed using the ifelse() function which will determine if a value in the Type field is equal to "Manual", in which case a the value 1 will be returned to the new vector,  else 0:

IsHighRisk <- ifelse(FraudRisk$Type=="Manual",1,0)
7.png

Run the line of script to console:

8.png

Append the newly created vector to the FraudRisk data frame:

9.png

Run the line of script to console:

10.png