14) Creating a Factor from a Vector.

This Blog entry is from the Data Structures section in Learn R.

The factor() function turns a Vector containing character fields into a special structure for categorical variables.  Categorical variables are treated differently in data analysis as conceptually they are pivoted to columns in their own right.

Assume that a Vector of customer genders exists:

Gender <- c("Male","Female","Female","Male")

a-script-for-creating-genders-in-r.png

Run the line of script to console:

a-vector-of-genders-written-to-r-console.png

A standard vector has been created.  To transform this Vector into a Factor, simply pass the Gender Vector as an argument to the factor() function by typing:

GenderFactor <- factor(gender)
a-script-in-r-to-turn-a-vector-into-a-factor.png

Run the line of script to console:

a-vector-being-turned-into-a-factor-written-to-r-console.png

It can be observed that the Factor is now available in the environment pane:

a-factor-being-displayed-in-the-rstudio-environment-window.png

To view the factor in the console type:

GenderFactor
a-script-to-write-out-the-gender-factor-in-r.png

Run the line of script to console:

the-gender-factor-being-written-to-r-console.png

Closer inspection shows that despite there being a vector of the strings Male and Female duplicated,  the Factor has correctly identified there to be two levels of Male and Female.  This procedure is an example of the levels being inferred.  Categorical data will not be treated nativily in the predictive analytics tools as follows.

1) Create a Vector with c Function.

This Blog entry is from the Data Structures section in Learn R.

The c function is used to combine variables into a vector.  To create a numeric Vector,  start by typing:

NumericVector <- c(1,2,3,4,5)
creating-a-numeric-vector-using-c-function-in-r.png

Run the line of script to console:

c-function-sent-to-r-console-to-create-a-numeric-vector.png

The vector appears in the environment pane,  showing the dimensions of [1,5],  which would suggest 1 row,  five columns:

numeric-vector-written-to-environment-window-in-rstudio.png

The vector can be referenced in the console, as with all other variables, by typing:

NumericVector
script-entry-to-return-a-numeric-vector-in-r.png

Run the line of script to the console:

numeric-vector-written-out-to-r-console.png

To observe how R handles vectors, comprised of separate types (in so far as it CANT handle it), start by typing:

Mixed <- c(1,2,3,4,”string”)
script-line-to-create-a-vector-of-mixed-types.png

Run the script to console:

mixed-types-vector-in-rstudio-environment-window.png

It can be seen that the vector has been created and is displayed in the environment pane, however, it is being created as a character vector owing to the presence of character argument which cannot be coerced to a numeric value and as such the entire vector becomes a character vector.  To validate this in the console, type:

Mixed
script-to-write-out-mixed-vector-in-r.png

Run the line of script to console:

mixed-vector-has-been-converted-to-string-in-r-console.png

It can be validated that the vector has been created as a string, based on the premise of the double quotations around all of the entries.

Introduction to Data Structures

This Blog entry is from the Data Structures section in Learn R.

Although R seems intimidating at first, requiring what seems to be programming skills, this belies that most of the use cases complex predictive analytics can in fact be distilled into simple procedures, indeed Blog entries.  It is most certainly not correct that R need be viewed as a programming language.

There are certain basic principles that need to be understood however and as covered the first section, this section sets out to reinforce these principles.

In this section, Data Structures, available to R, will be explored.   The exercise will require a new script to have been opened in RStudio.

1.png