13) Adding Vectors \ Factors to an existing Data Frame.

This Blog entry is from the Loading and Shaping section in Learn R.

Abstraction is a core part of the machine learning task and horizontal abstraction would see the creation of many columns which rely on the foundation columns.  In this example, a target of 50% uplift on the current price will be created as a separate column called Target (i.e. Interim_Close + (Interim_Close / 2).  Firstly, create a vector which performs the formula on the Interim_Close value of the data frame AAPL by typing:

Target = AAPL\$Interim_Close + (AAPL\$Interim_Close  / 2)

Run the line of script to console:

To add the column to the AAPL data frame use the mutate() function which takes the target data frame as first argument,  followed by the column to added:

AAPL <- mutate(AAPL,Target)

Run the line of script to console:

View the newly created column by typing:

View(AAPL)

Run the line of script to console to expand the data viewer in the script window:

It can be observed that the vector has been added to the data frame.  The mutate() function is by far the most useful function in the creation of abstractions, whereby a vector is created via several steps,  with the final vector being mutated into a Target data frame.

15) Creating a Factor from a Vector with Levels and Ordering.

This Blog entry is from the Data Structures section in Learn R.

Some categorical data does also have a precedence whereby each of the categorical variables is somehow elevated from the previous one, while not necessarily being distributed in a statistical fashion.  A good example would be temperature.  Start by creating a Vector called Temps:

Temps <- c("High","Medium","Low","Low","Medium")

Run the line of script to console:

Create a similar Vector, this time with the distinct values in the order of precidence:

TempsDistinctOrder <- c("Low","Medium","High")

Run the line of script to console:

Create the factor by bringing the two newly created Vectors trogether and specfying that ordering is to be observed:

TempsFactor <- factor(Temps,TempsDistinctOrder,ordered=TRUE)

Run the line of script to console:

Write the Factor to console by typing:

TempsFactor

Run the line of script to console:

It can be seen that the Factor levels now have < chevrons which denote the precedence.  Low is less than Medium,  Medium is less than High.  Rather usefully it is possible to use a logical test condition to perform a logical test for only those values in the factor that exceed a given level,  for example type:

TempsFactor > "Low"

Run the line of script to console:

It can be seen that a Vector of logical operators has been returned that could further be used for selecting and sub setting.

14) Creating a Factor from a Vector.

This Blog entry is from the Data Structures section in Learn R.

The factor() function turns a Vector containing character fields into a special structure for categorical variables.  Categorical variables are treated differently in data analysis as conceptually they are pivoted to columns in their own right.

Assume that a Vector of customer genders exists:

Gender <- c("Male","Female","Female","Male")

Run the line of script to console:

A standard vector has been created.  To transform this Vector into a Factor, simply pass the Gender Vector as an argument to the factor() function by typing:

GenderFactor <- factor(gender)

Run the line of script to console:

It can be observed that the Factor is now available in the environment pane:

To view the factor in the console type:

GenderFactor

Run the line of script to console:

Closer inspection shows that despite there being a vector of the strings Male and Female duplicated,  the Factor has correctly identified there to be two levels of Male and Female.  This procedure is an example of the levels being inferred.  Categorical data will not be treated nativily in the predictive analytics tools as follows.