8) Importing a pipe separated file.

This Blog entry is from the Loading and Shaping section in Learn R.

While a csv file is the most prolific means to exchange datasets, it is not by any means the only structure of text file.  Other types of delimiter, this is to say using something other than a comma to separate the fields of a dataset, may include a pipe (i.e |) a tab, a semicolon (;) or just a space.

The readr package provides for the importing of data which has a slightly different structure to a csv file.  This procedure will not use RStudio, instead focus on creating a script for the purposes of reproducibility.

Create a new script window in RStudio by navigating to clicking on the new script icon, then clicking RScript:


A blank script will be created:


Start by loading the readr library by typing:


Run the line of script to console:


In this example, a file containing the same data as imported in procedure 46 will be used albeit the delimiter is a pipe and not a comma.  The file is available in Bundle\Data\Equity\Pipe\AAPL.txt:


To import the pipe delimited file use the read_delim() function of the readr package.  The function takes the arguments of the name and location of the file (in this case Bundle\Data\Equity\Pipe\AAPL.txt) then the delimiter (in this case |).  To layout the read_delim() function type:

AAPL <- Read_delim("D:/Users/Trainer/Desktop/Bundle/Data/Equity/Pipe/AAPL.txt","|")

Note that the default backslash file structure used in windows (i.e. \) has been changed to a forward slash (i.e. /).  Further in this example it is important to change the preceding file location of the bundle to the correct location on the computer (i.e. D:/Users/Trainer/Desktop/).  Run the line of script to console:


It can be seen that the specification for the data frame has been written out and that there are now errors.  View, and validate, the import by typing:


Run the line of script to console to expand the data frame to the script window:


7) Importing a CSV file with R Studio.

This Blog entry is from the Loading and Shaping section in Learn R.

RStudio offers a simple GUI user interface to load files into Data Frames.  The functionality is of course distinct to RStudio but in practice it is a code creator that uses the read.table() function to load a variety of common file formats to a Data Frame.

The procedure here in will use the datasets contained in the bundle.  In this procedure, the csv datasets contained in \Bundle\Data\Equity\Equity will be targeted:


Specifically, the AAPL.csv file which contains a series of prices relating to the Apple share price:


In RStudio, navigate to the Import Dataset button in the top right-hand corner of the screen, above the environment pane:


Click the button Import Dataset:


Click the From CSV sub menu:


The Import Text file window will expand.  Click the browse button in the top right-hand corner of the window to open the file system navigator:


Navigate to Bundle\Data\Equity\Equity\AAPL.csv and click the Open button:


A preview of the file is show in the window for the purposes of validation:


As is the case with many RStudio functions it is in essence a macro or code creation widget.  It can be seen in the bottom right hand corner that RStudio has created the corresponding R script block that will be responsible for importing the file in the console:


In this example, it can be observed that the readr package is being loaded, the csv file is being loaded to a data frame called AAPL using the read_csv function.  The readr is a more efficient package for the importing and exporting of data created by the RStudio team and while there are several functions for the import and export of data native to R, these are not especially performant.  It is worth noting that this package WILL NOT convert strings to factors, making it a more labour-intensive choice for text rich datasets that are intended to be the source of predictive analytics methods.

Towards the bottom left hand corner of window is additional parameters available in the creation of the csv file.


Simply click import to load the data into the R session:


It can be seen that the block of script has been run to console, that the AAPL data frame is now available in the environment pane and care of the View() function, that the data frame has been displayed in a tab of the script pane:


It is important to note that all RStudio had done is create a block of R script and executed this to console.  In the interests of reproducibility and in a script active console passive methodology, this block of script should be reproduced directly in a script.  By way of standard, the readr package will be used in most, but not all, importing methods.

Expanding on the data frame it can be observed that the readr package has facilitated the creation of the correct object types:


In this case, it can be seen that the handling of dates has taken place via POSIXCT, which is an alternative date handling object as detailed in procedure 43.