13) Adding Vectors \ Factors to an existing Data Frame.

This Blog entry is from the Loading and Shaping section in Learn R.

Abstraction is a core part of the machine learning task and horizontal abstraction would see the creation of many columns which rely on the foundation columns.  In this example, a target of 50% uplift on the current price will be created as a separate column called Target (i.e. Interim_Close + (Interim_Close / 2).  Firstly, create a vector which performs the formula on the Interim_Close value of the data frame AAPL by typing:

Target = AAPL$Interim_Close + (AAPL$Interim_Close  / 2)
1.png

Run the line of script to console:

2.png

To add the column to the AAPL data frame use the mutate() function which takes the target data frame as first argument,  followed by the column to added:

AAPL <- mutate(AAPL,Target)
3.png

Run the line of script to console:

4.png

View the newly created column by typing:

View(AAPL)
5.png

Run the line of script to console to expand the data viewer in the script window:

6.png

It can be observed that the vector has been added to the data frame.  The mutate() function is by far the most useful function in the creation of abstractions, whereby a vector is created via several steps,  with the final vector being mutated into a Target data frame.

15) Creating a Factor from a Vector with Levels and Ordering.

This Blog entry is from the Data Structures section in Learn R.

Some categorical data does also have a precedence whereby each of the categorical variables is somehow elevated from the previous one, while not necessarily being distributed in a statistical fashion.  A good example would be temperature.  Start by creating a Vector called Temps:

Temps <- c("High","Medium","Low","Low","Medium")
creating-a-factor-script-in-r-of-temps.png

Run the line of script to console:

temps-factor-created-in-r-console.png

Create a similar Vector, this time with the distinct values in the order of precidence:

TempsDistinctOrder <- c("Low","Medium","High")
temps-in-their-logical-order-r-script.png

Run the line of script to console:

temps-in-their-logical-order-written-out-to-r-console.png

Create the factor by bringing the two newly created Vectors trogether and specfying that ordering is to be observed:

TempsFactor <- factor(Temps,TempsDistinctOrder,ordered=TRUE)
logical-order-of-temps-created-as-a-factor-r-script.png

Run the line of script to console:

creating-a-factor-of-logical-order-written-out-in-r-console.png

Write the Factor to console by typing:

TempsFactor
a-script-in-r-to-validate-factor-logical-order.png

Run the line of script to console:

factor-in-logical-order-written-out-to-r-console.png

It can be seen that the Factor levels now have < chevrons which denote the precedence.  Low is less than Medium,  Medium is less than High.  Rather usefully it is possible to use a logical test condition to perform a logical test for only those values in the factor that exceed a given level,  for example type:

TempsFactor > "Low"
script-to-filter-a-factor-in-logical-order-in-r.png

Run the line of script to console:

results-in-r-console-of-filtering-a-factor-on-greater-logic.png

It can be seen that a Vector of logical operators has been returned that could further be used for selecting and sub setting.

14) Creating a Factor from a Vector.

This Blog entry is from the Data Structures section in Learn R.

The factor() function turns a Vector containing character fields into a special structure for categorical variables.  Categorical variables are treated differently in data analysis as conceptually they are pivoted to columns in their own right.

Assume that a Vector of customer genders exists:

Gender <- c("Male","Female","Female","Male")

a-script-for-creating-genders-in-r.png

Run the line of script to console:

a-vector-of-genders-written-to-r-console.png

A standard vector has been created.  To transform this Vector into a Factor, simply pass the Gender Vector as an argument to the factor() function by typing:

GenderFactor <- factor(gender)
a-script-in-r-to-turn-a-vector-into-a-factor.png

Run the line of script to console:

a-vector-being-turned-into-a-factor-written-to-r-console.png

It can be observed that the Factor is now available in the environment pane:

a-factor-being-displayed-in-the-rstudio-environment-window.png

To view the factor in the console type:

GenderFactor
a-script-to-write-out-the-gender-factor-in-r.png

Run the line of script to console:

the-gender-factor-being-written-to-r-console.png

Closer inspection shows that despite there being a vector of the strings Male and Female duplicated,  the Factor has correctly identified there to be two levels of Male and Female.  This procedure is an example of the levels being inferred.  Categorical data will not be treated nativily in the predictive analytics tools as follows.