1) Train a Neural Network.

This Blog entry is from the Neural Networks section in Learn R.

In this procedure, improvement will be sought Linear Regression numeric prediction,  using the FDX dataset.  Start by importing the dataset using the readr package and read.csv() function (as there are no strings to be converted to factors):

library(readr)
FDX <- read_csv("D:/Users/Trainer/Desktop/Bundle/Data/Equity/Abstracted/FDX/PC_FDX_Close_200x1D_Close_50x1D_10.csv")
View(FDX)
1.png

Run the line of script to console:

2.png

It can be seen via the RStudio viewer that the FDX dataset has been loaded into R:

3.png

To train a neural network, firstly download and install the package using the RStudio interface:

4.png

Click Install to execute the installation:

5.png

Load the library:

library(neuralnet)
6.png

Run the line of script to console:

7.png

In this example, a warning has been displayed saying that the build was done in a later version of R, however backward compatibility can be reasonably assured and as such the warning can be ignored.  Once R version 3.3.3 has become stable, it might be worth upgrading.

Building, or training, a Neural Network is very similar to building a regression model, save for a few parameters nuanced to this function (not least that the overall package is VERY unforgiving with almost no intuitive error messages).  In this example, a neural network will be created with an arbitrary four processing elements, with one hidden layer.  The dot notation, typically used to instruct all variables, does not work with this function currently (it is a bug) and so a manually constructed formula need be created. 

Furthermore, for the purposes of these Blog entries, it is beneficial to have a slightly more limited feature set owing to the time it would take to train and that, despite popular belief, less is quite often more when training Neural Networks.  It is also worth noting that the neuralnetwork() function is a single threaded function and can take a VERY long time to train upon data frames which contain many records and many independent variables.

In this example, a neural network is going to be built upon 10 independent variables known to correlate well to the dependent variable (it is a source of contentions debate as to whether correlation is the most useful means to select variables in non-linear modelling techniques).  While neural networks are tremendous at processing a very large number of features, this is often at the expense of generalisation and as such, the bug, encourages more care and thought in creating a more appropriate neural network:

NeuralNetworkFourByOne <- neuralnet(Dependent ~ Skew_3 + Max_4 + PointStep_16 + Close_3 + Close_4 + PointStep_17_ZScore + PointStep_15 + TypicalValue_4 + Range_4 + Range_2, data = FDX, hidden = 4)
8.png

Run the line of script to console,  being prepared to wait a little while:

9.png

Upon the console returning,  the neural network has been trained.   Understanding the structure and performance of the neural network is a rather more complex affair than other Blog entries (which fits with the overall experience of using the package).

5) Create a Naive Bayesian Network with a Laplace Estimator.

This Blog entry is from the Naive Bayesian section in Learn R.

To create a Bayesian model with a nominal Laplace estimator of 1, which will mean that in the event that there is nothing it is switch to at least one occurrence in the observation, simply change the parameter value in the training:

SafeBayesianModel <- naiveBayes(CreditRisk,CreditRisk$Dependent,laplace=1)
1.png

Run the line of script to console:

2.png

A Bayesian model has been created as SafeBaysianModel.  Recall the model:

ClassPredictions <- predict(SafeBayesianModel,CreditRisk,type = "class")
3.png

Run the line of script to console:

4.png

The de-facto method to appraise the performance of the model would be to create a confusion matrix:

library(gmodels)
CrossTable(CreditRisk$Dependent, ClassPredictions)
5.png

Run the block of script to console:

6.png

It can be seen that this naive Bayesian model appears to be startlingly accurate, which stands to reason as the same data is being used to test as was trained.  It follows that this would benefit from an element of cross validation, which was introduced in Gradient Boosting Machines.

4) Recalling a Naive Bayesian Classifier for Classification.

This Blog entry is from the Naive Bayesian section in Learn R.

To recall the pivotal classification, rather than recall P for each class and drive it from the larger of the values, the type class can be specified:

ClassPredictions <- predict(BayesianModel,CreditRisk,type = "class")
1.png

Run the line of script to console:

2.png

Merge the classification predictions into the CreditRisk data frame, specifying the dply library also:

library(dplyr)
CreditRisk <- mutate(CreditRisk, ClassPredictions)
3.png

Run the line of script to console:

4.png

Viewing the CreditRisk data frame:

View(CreditRisk)
5.png

Run the line of script to console:

6.png

Scroll to the last column in the RStudio viewer to reveal the classification for each record:

7.png

3) Recalling a Naive Bayesian Classifier for P.

This Blog entry is from the Naive Bayesian section in Learn R.

One of the benefits of using a Bayesian classifier is that it can return initiative probabilities which, ideally, should be fairly well calibrated to the actual environment.  For example, suppose that a 30% P of rain is produced by a weather station for 100 days, if it were to rain on 30 of those days, that would be considered to be a well calibrated model.  It follows that quite often it is not just the classification that is of interest, but the probability of a classification being accurate.

The familiar predict() function is available for use with the BayesModel object, the data frame to use in the recall and specifying a type to equal Raw,  instructing the function to return P and not the most likely classification:

PPredictions <- predict(BayesianModel,CreditRisk,type = "raw")
1.png

Run the line of script to console:

2.png

A peek of the data in the PPredictions output can be obtained via the head() function:

head(PPredictions)
3.png

Run the line of script to console:

4.png

Horizontally the P will sum to one, and evidences clearly the most dominant class. Anecdotally, the calibration of P in naive Bayesian models can be somewhat disappointing, while the overarching classification and be surprisingly accurate.

2) Training a Naive Bayesian Classifier.

This Blog entry is from the Naive Bayesian section in Learn R.

As a Naive Bayesian classifier is rather simple in its concept, all independent variables being treated and arcs flowing away from the dependent variable, it is to be expected that the process of training such a classifier is indeed trivial.  To train a Bayesian model, simply pass the data frame, specify the factor that is to be treated as the dependent variable and the Laplace estimator (zero in this example).  The naiveBayes() function exists as part of the e1071 package,  a such begin by installing the package via RStudio:

1.png

Click install to download and install this package:

2.png

Reference the library:

library(e1071)

3.png

Run the line of script to console. To train a Naïve Bayesian model:

BayesianModel <- naiveBayes(CreditRisk,CreditRisk$Dependent,laplace=0)
4.png

Run the line of script to console. The BayesModel object now contains a model that can be used to make P predictions as well as classifications.