2) Extracting a substring from a string, testing logically and presenting for machine learning.

This Blog entry is from the Loading and Shaping section in Learn R.

In Horizontal Abstraction, it is quite common to have the requirement to inspect a string of data looking for an occurrence (or pattern) and return a logical value that can be used in machine learning.

In this example, a string will be inspected and return a 1 in the event that the string "Richard" is present.

Firstly, create a vector of name strings by typing:

Names <- c("Richard","Robert","Reinhard","Raymond","Richardino","Richardo")

Run the line of script to console:


Use the substr() function to create a vector of the first 7 characters of the value contained in the Names vector,  by typing:

NamesSubstr <= substr(Names,1,7)

Write the NamesSubstr vector:


Run the line of script to console:


The question being posed is whether the first characters of the name is equal to "Richard".  To perform this evaluation, create a logical vector from the NamesSubstr vector by typing:

NamesSubstrLogical <- NamesSubstr == "Richard"
a script using the results of the substr function to match a name in r

Notice how a double equals sign is used to eliminate confusion between evaluation and assignment. 

Run the line of script to console:


Write the logical vector out to console by typing:


Run the line of script to console:


The character notion of TRUE or FALSE cannot be used in machine learning readily (you can’t multiply by text) and it follows that these values should be converted to a numeric value using the as.numeric() function,  typing:

NamesSubstrLogicalNumeric <- as.numeric(NamesSubstrLogical)

Run the line of script to console:


Write the newly created vector to console by typing:


Run the line of script to console. A more concise line of script nesting the functions might be:

NamesSubstrLogicalNumericNested <- as.numeric(substr(Names,1,7) == "Richard")

An alternative approach might be converting the logical vector to a factor.