13) Adding Vectors \ Factors to an existing Data Frame.

This Blog entry is from the Loading and Shaping section in Learn R.

Abstraction is a core part of the machine learning task and horizontal abstraction would see the creation of many columns which rely on the foundation columns.  In this example, a target of 50% uplift on the current price will be created as a separate column called Target (i.e. Interim_Close + (Interim_Close / 2).  Firstly, create a vector which performs the formula on the Interim_Close value of the data frame AAPL by typing:

Target = AAPL$Interim_Close + (AAPL$Interim_Close  / 2)
1.png

Run the line of script to console:

2.png

To add the column to the AAPL data frame use the mutate() function which takes the target data frame as first argument,  followed by the column to added:

AAPL <- mutate(AAPL,Target)
3.png

Run the line of script to console:

4.png

View the newly created column by typing:

View(AAPL)
5.png

Run the line of script to console to expand the data viewer in the script window:

6.png

It can be observed that the vector has been added to the data frame.  The mutate() function is by far the most useful function in the creation of abstractions, whereby a vector is created via several steps,  with the final vector being mutated into a Target data frame.

11) Sorting a Data Frame with the arrange() function.

This Blog entry is from the Loading and Shaping section in Learn R.

The Blog entries that follows are born of the dplyr package which is a collection of functions that exist for the purpose of shaping and molding data frames.  The first step is to ensure that the dplyr package is available by installing it through the Install section of the packages pane.  Search for dplyr:

1.png

Click Install to download and install the dplyr package:

2.png

Load the dplyr library by typing:

library(dplyr)
3.png

The package dplyr exposes several functions for shaping and moulding data.  The arrange() function is used to rearrange, rather sort, the order of data in a data frame by columns in ascending order:

To arrange data by date for the AAPL data frame:

AAPL <- arrange(AAPL,Interim_Buffer_Date)
4.png

Run the line of script to console:

5.png

View the AAPL data frame to observe the change in row arrangement:

View(AAPL)
6.png

Run the line of script to console:

7.png

Run sort in a different direction can be achieved using the desc() function wrapped around the column to be sorted.   To change the direction of sort order on the Interim_Buffer_Date type:

AAPL <- arrange(AAPL,desc(Interim_Buffer_Date))
8.png

Run the line of script to console:

9.png

Observe the change in sort order:

View(AAPL)
10.png

Run the line of script to console:

11.png

It can be seen that the sort order has changed direction completely.  To sort by one column, then the next, simply list out the columns in order then direction of the sort:

AAPL <- arrange(AAPL,desc(Interim_Buffer_Date),Interim_Close)
12.png

Run the line of script to console:

13.png

20) Saving .Rdata to file.

This Blog entry is from the Data Structures section in Learn R.

Machine learning is predominately a challenge of data abstraction – this is the shaping and molding of data – and presenting it to advanced machine learning algorithms on a commodity basis.  It follows that upon having spent time and effort creating an elaborate Data Frame,  it likely that it will need to be saved for future use (if only to avoid the computational expense of recreating it).

The save() function exists for the purpose of saving most objects that can be created and populated with data to a file in the working directory.  It is a very important part to deploying models on a real-time basis.

To save the Data Frame LabeledDataFrame and BucketList to a specified file by the name of "Example.RData":

save(LabeledDataFrame,BucketList,file = "Example.RData")
saving-an-object-script-in-r.png

Run the line of script to console:

saved-object-written-to-r-console.png

A file titled Example.RData is not written out to the Working Directory.  To remind the working directory:

getwd()
check-working-directory-script-in-r-for-save-validation.png

Run the line of script to console:

working-directory-written-out-to-r-console-for-save-validation.png

Having identified the working directory, navigate to the same in windows explorer:

object-saved-in-file-system-by-r.png

The saved file is clearly visible in this directory ready for real-time deployment or being reloaded to an R session.

19) Create a Data Frame from Names and stringsAsFactors.

This Blog entry is from the Data Structures section in Learn R.

As introduced previously the data.frame() function, not unlike the list() function, has more flexibility to be able to create objects than the c() function.   As seems intuitive it is possible to specify names explicitly rather than take the names of the Vectors by default.  There is an argument to the data.frame() function that can ease the burden of creating factors upon detection of character vectors in the for of the stringsAsFactors switch (although it is not always sensible to use it in the case of numeric prediction focus). 

To create a Data Frame with specific names and disabling stringsAsFactors:

LabeledDataFrame <- data.frame(data.frame(ExampleFullNames = FullNames,ExampleFullAges = FullAges,ExampleFullGender = FullGender,stringsAsFactors = FALSE))
a-script-in-r-to-add-a-names-to-a-data-frame.png

Run the line of script to console:

new-data-frame-with-names-written-to-r-console.png

Return the Data Frame by typing:

a-script-to-write-out-a-labeled-data-frame-in-r.png

Run the line of script to console:

a-named-data-frame-written-out-to-the-r-console.png

It can be observed that the column names have been correctly specified.  Unless a factor has been specifically allocated it can be trusted that other character Vectors,  such as FullName in this example, will not be transposed to factors automatically.

18) Create a Data Frame from Vectors.

This Blog entry is from the Data Structures section in Learn R.

For the great majority of Blog entries that follow in this document the Data Frame is clearly demonstrated to be the most important and ubiquitous data structure.  In its core a Data Frame is a list albeit with certain constraints.  A data frame can only make use of Vectors and Factors and furthermore these objects need to be of EXACTLY the same length. 

It can be helpful to thing of a Data Frame a being a hybrid of a Matrix and a List,  with a great deal more usability than a Matrix. It is worth remembering that owing to the presence of Factors and Vectors, this is to say different object types,  a matrix could not be used in all practicality.

To create a data frame of customers,  start by creating a vector of full names:

FullNames <- c("Donald Trump","Hilary Clinton"," Gary Johnson")
an-example-vector-to-be-used-in-a-data-frame-in-r.png

Run the line of script to console:

a-new-string-vector-created-and-written-to-r-console-for-data-frame.png

Repeat for a Vector of FullAges:

FullAges <- c(70,69,50)
a-numeric-vector-to-be-brought-together-in-a-data-frame-in-r.png

Run the line of script to console:

a-numeric-vector-written-out-to-r-console.png

Repeat for a Factor of FullGender, noting that the result of the c() function is being passed as the argument to thee factor() function:

FullGender <- factor(c("Male","Female","Male"))
a-factor-to-be-used-in-a-data-frame-in-r.png

Run the line of script to console:

a-factor-created-in-r-and-written-to-r-console.png

In a similar manner to both the c() function and the list() function,  the data.frame() fuction takes Vectors or Factors of the same length and combines them into a Data Frame.  As with the list() function it accepts a number of arguments in its advanced use,  however,  its most basic structure is the same as c().  To create a dataframe with default agurments type:

FullDataFrame <- data.frame(FullNames,FullAges,FullGender)
using-the-data-frame-function-to-create-a-data-frame-from-vectors.png

Run the line of script to console:

a-data-frame-created-bringing-together-factors-and-vectors-in-r-console.png

It can be observed that the data frame is now displayed in the environment pane under the data section and as such can be viewed in a similar manner to that set forth in procedure 27.

a-data-frame-written-to-the-environment-variables-in-rstudio.png

In this example a view is performed by a single click of the entry under the data section of the environment pane:

clickinng-on-the-data-frame-explodes-the-data-frame-in-the-same-way-as-a-matrix.png

In a similar manner to a Matrix,  the Data Frame is expanded into the grid viewer section of RStudio as a table.