8) Importing a pipe separated file.

This Blog entry is from the Loading and Shaping section in Learn R.

While a csv file is the most prolific means to exchange datasets, it is not by any means the only structure of text file.  Other types of delimiter, this is to say using something other than a comma to separate the fields of a dataset, may include a pipe (i.e |) a tab, a semicolon (;) or just a space.

The readr package provides for the importing of data which has a slightly different structure to a csv file.  This procedure will not use RStudio, instead focus on creating a script for the purposes of reproducibility.

Create a new script window in RStudio by navigating to clicking on the new script icon, then clicking RScript:

1.png

A blank script will be created:

2.png

Start by loading the readr library by typing:

library(readr)
3.png

Run the line of script to console:

4.png

In this example, a file containing the same data as imported in procedure 46 will be used albeit the delimiter is a pipe and not a comma.  The file is available in Bundle\Data\Equity\Pipe\AAPL.txt:

5.png

To import the pipe delimited file use the read_delim() function of the readr package.  The function takes the arguments of the name and location of the file (in this case Bundle\Data\Equity\Pipe\AAPL.txt) then the delimiter (in this case |).  To layout the read_delim() function type:

AAPL <- Read_delim("D:/Users/Trainer/Desktop/Bundle/Data/Equity/Pipe/AAPL.txt","|")
6.png

Note that the default backslash file structure used in windows (i.e. \) has been changed to a forward slash (i.e. /).  Further in this example it is important to change the preceding file location of the bundle to the correct location on the computer (i.e. D:/Users/Trainer/Desktop/).  Run the line of script to console:

7.png

It can be seen that the specification for the data frame has been written out and that there are now errors.  View, and validate, the import by typing:

View(AAPL)
8.png

Run the line of script to console to expand the data frame to the script window:

9.png

7) Importing a CSV file with R Studio.

This Blog entry is from the Loading and Shaping section in Learn R.

RStudio offers a simple GUI user interface to load files into Data Frames.  The functionality is of course distinct to RStudio but in practice it is a code creator that uses the read.table() function to load a variety of common file formats to a Data Frame.

The procedure here in will use the datasets contained in the bundle.  In this procedure, the csv datasets contained in \Bundle\Data\Equity\Equity will be targeted:

windows-explorer-showing-all-of-the-files-that-can-be-loaded-to-r.png

Specifically, the AAPL.csv file which contains a series of prices relating to the Apple share price:

an-open-excel-spreadsheet-showing-stock-prices-to-be-loaded-to-r.png

In RStudio, navigate to the Import Dataset button in the top right-hand corner of the screen, above the environment pane:

the-location-of-the-import-dataset-button-in-rstudio.png

Click the button Import Dataset:

clicking-on-import-dataset-and-from-csv-in-rstudio.png

Click the From CSV sub menu:

the-window-to-load-csv-files-in-rstudio.png

The Import Text file window will expand.  Click the browse button in the top right-hand corner of the window to open the file system navigator:

a-csv-file-with-apple-stock-prices.png

Navigate to Bundle\Data\Equity\Equity\AAPL.csv and click the Open button:

a-preview-of-the-apple-stock-prices-to-be-loaded-to-r.png

A preview of the file is show in the window for the purposes of validation:

some-script-created-to-perform-the-csv-file-load.png

As is the case with many RStudio functions it is in essence a macro or code creation widget.  It can be seen in the bottom right hand corner that RStudio has created the corresponding R script block that will be responsible for importing the file in the console:

r-script-to-load-a-csv-file-to-r.png

In this example, it can be observed that the readr package is being loaded, the csv file is being loaded to a data frame called AAPL using the read_csv function.  The readr is a more efficient package for the importing and exporting of data created by the RStudio team and while there are several functions for the import and export of data native to R, these are not especially performant.  It is worth noting that this package WILL NOT convert strings to factors, making it a more labour-intensive choice for text rich datasets that are intended to be the source of predictive analytics methods.

Towards the bottom left hand corner of window is additional parameters available in the creation of the csv file.

some-of-the-options-available-to-rstudio-when-loading-a-csv-file.png

Simply click import to load the data into the R session:

where-to-click-to-load-a-csv-file-to-r.png

It can be seen that the block of script has been run to console, that the AAPL data frame is now available in the environment pane and care of the View() function, that the data frame has been displayed in a tab of the script pane:

several-locations-in-rstudio-showing-that-a-csv-file-has-been-loaded.png

It is important to note that all RStudio had done is create a block of R script and executed this to console.  In the interests of reproducibility and in a script active console passive methodology, this block of script should be reproduced directly in a script.  By way of standard, the readr package will be used in most, but not all, importing methods.

Expanding on the data frame it can be observed that the readr package has facilitated the creation of the correct object types:

the-environment-window-in-rstudio-which-shows-the-csv-file-is-now-a-dataframe.png

In this case, it can be seen that the handling of dates has taken place via POSIXCT, which is an alternative date handling object as detailed in procedure 43.