1) Pivot a Categorical Variable for Regression Analysis

This Blog entry is from the Logistic Regression section in Learn R.

In behavioral analytics and classification, character data and numeric label data (that which has a numeric label, but obeys no standard distribution) appear quite often.  It is necessary to pre-process such label data, pivoting the distinct values to their own columns, representing either a 1 or a 0, for example the transaction in this instance was either made on a Chip card (i.e. 1) or it was not (i.e. 0)  

For dealing with categorical variables, and as a labor-saving tactic to avoid having to perform categorical data pivoting on each and every distinct entry in a vector, the factor functionality can be invoked. It is most ideal if these categorical data pivots are done during data preparation, in an SQL procedure or the Jube platform.

It can be seen that the data was imported with the type field taking the form of a character field:


Start by creating a factor which will implicitly convert the contents of the Type column to the factor:


Run the line of script to console:


It can be seen that the factor has been created and appears in the environment pane:


All that remain is to append the newly created to factor to the FraudRisk data frame to that it can be used in previous Blog entries:

FraudRisk <- mutate(FraudRisk,TypeFactor)

Run the block of script to console:


While R has a convenient data structure in the form of factors, it may well be appropriate to manually pivot data to a vector based on rudimentary if logic and \ or as part of horizontal abstraction.   In this example, a vectorised comparison will be performed using the ifelse() function which will determine if a value in the Type field is equal to "Manual", in which case a the value 1 will be returned to the new vector,  else 0:

IsHighRisk <- ifelse(FraudRisk$Type=="Manual",1,0)

Run the line of script to console:


Append the newly created vector to the FraudRisk data frame:


Run the line of script to console:


Introduction to Logistic Regression

This Blog entry is from the Logistic Regression section in Learn R.

Logistic Regression is a modelling technique that can be used for classification where the dependent variable values are binary, 1 or 0 as such.  The dataset that is used in this section of Blog entries is available under \Bundle\Data\FraudRisk\FraudRisk.csv which contains a set of debit card transactions whereby half of the dataset is a sample of fraudulent transactions, half of the dataset is a sample of legitimate transactions.

To proceed with the subsequent procedures, it is necessary to import the file FraudRisk.csv into R.