15) Delete a Vector from a Data Frame.

This Blog entry is from the Loading and Shaping section in Learn R.

In these Blog entries, the mutate() function of dplyr has been used to add a vector into a data frame.  It is worthy of a brief mention that to remove a vector from a data frame, it is simply a matter of passing NULL to the vector in question:

AAPL$Target <- NULL

These Blog entries do not make mention to the deletion of vectors from a data frame, rather it is mentioned only for completeness.

14) Merging a Data Frame

This Blog entry is from the Loading and Shaping section in Learn R.

Repeat the Blog entries set forth to create a data frame, this time creating a data frame called Descriptions from the table EOD_Descriptions by typing:

Descriptions <- sqlQuery(Connection,"select * from EOD_Desccriptions")
1.png

Run the line of script to console:

2.png

View the Descriptions data frame by typing:

3.png

Run the line of script to console:

4.png

It can be seen that symbol column is common between the AAPL table and the Descriptions table.

The task in this Blog entry is to merge the data frames together on the Symbol identifier, which will then provide a description next to each and every record in the AAPL dataset.  The inner_join() function seeks to bring together all records where the key in one data frame is present in the other. 

To join two data frames in this manner type:

AAPL <- inner_join(AAPL,Descriptions,ID = "Symbol")
5.png

Run the line of script to console:

6.png

Notice that an error relating to levels has been produced, this is owing to there being a disparity in the number of records in one table as opposed to the next.  Inspect the new dataset by typing:

View(AAPl)
7.png

It can be seen that the description field from the Descriptions Data Frame has been duplicated across each record in the AAPL Data Frame, as would be expected of an Inner Join in a database:

8.png

13) Adding Vectors \ Factors to an existing Data Frame.

This Blog entry is from the Loading and Shaping section in Learn R.

Abstraction is a core part of the machine learning task and horizontal abstraction would see the creation of many columns which rely on the foundation columns.  In this example, a target of 50% uplift on the current price will be created as a separate column called Target (i.e. Interim_Close + (Interim_Close / 2).  Firstly, create a vector which performs the formula on the Interim_Close value of the data frame AAPL by typing:

Target = AAPL$Interim_Close + (AAPL$Interim_Close  / 2)
1.png

Run the line of script to console:

2.png

To add the column to the AAPL data frame use the mutate() function which takes the target data frame as first argument,  followed by the column to added:

AAPL <- mutate(AAPL,Target)
3.png

Run the line of script to console:

4.png

View the newly created column by typing:

View(AAPL)
5.png

Run the line of script to console to expand the data viewer in the script window:

6.png

It can be observed that the vector has been added to the data frame.  The mutate() function is by far the most useful function in the creation of abstractions, whereby a vector is created via several steps,  with the final vector being mutated into a Target data frame.

12) Specifying columns of a Data Frame to return.

This Blog entry is from the Loading and Shaping section in Learn R.

The select() function returns just the columns specified after the data frame.  In this example, the AAPL data frame will be have some columns truncated leaving only the columns Interim_Buffer_Date and Interim_Close:

AAPL <- select(AAPL,Symbol,Interim_Buffer_Date,Interim_Close)
1.png

Run the line of script to console:

2.png

View the data frame:

View(AAPL)
3.png

Run the line of script to console:

4.png

It can be observed that the data frame has discarded columns that were not specified explicitly.

11) Sorting a Data Frame with the arrange() function.

This Blog entry is from the Loading and Shaping section in Learn R.

The Blog entries that follows are born of the dplyr package which is a collection of functions that exist for the purpose of shaping and molding data frames.  The first step is to ensure that the dplyr package is available by installing it through the Install section of the packages pane.  Search for dplyr:

1.png

Click Install to download and install the dplyr package:

2.png

Load the dplyr library by typing:

library(dplyr)
3.png

The package dplyr exposes several functions for shaping and moulding data.  The arrange() function is used to rearrange, rather sort, the order of data in a data frame by columns in ascending order:

To arrange data by date for the AAPL data frame:

AAPL <- arrange(AAPL,Interim_Buffer_Date)
4.png

Run the line of script to console:

5.png

View the AAPL data frame to observe the change in row arrangement:

View(AAPL)
6.png

Run the line of script to console:

7.png

Run sort in a different direction can be achieved using the desc() function wrapped around the column to be sorted.   To change the direction of sort order on the Interim_Buffer_Date type:

AAPL <- arrange(AAPL,desc(Interim_Buffer_Date))
8.png

Run the line of script to console:

9.png

Observe the change in sort order:

View(AAPL)
10.png

Run the line of script to console:

11.png

It can be seen that the sort order has changed direction completely.  To sort by one column, then the next, simply list out the columns in order then direction of the sort:

AAPL <- arrange(AAPL,desc(Interim_Buffer_Date),Interim_Close)
12.png

Run the line of script to console:

13.png