5) Create a Naive Bayesian Network with a Laplace Estimator.

This Blog entry is from the Naive Bayesian section in Learn R.

To create a Bayesian model with a nominal Laplace estimator of 1, which will mean that in the event that there is nothing it is switch to at least one occurrence in the observation, simply change the parameter value in the training:

SafeBayesianModel <- naiveBayes(CreditRisk,CreditRisk$Dependent,laplace=1)
1.png

Run the line of script to console:

2.png

A Bayesian model has been created as SafeBaysianModel.  Recall the model:

ClassPredictions <- predict(SafeBayesianModel,CreditRisk,type = "class")
3.png

Run the line of script to console:

4.png

The de-facto method to appraise the performance of the model would be to create a confusion matrix:

library(gmodels)
CrossTable(CreditRisk$Dependent, ClassPredictions)
5.png

Run the block of script to console:

6.png

It can be seen that this naive Bayesian model appears to be startlingly accurate, which stands to reason as the same data is being used to test as was trained.  It follows that this would benefit from an element of cross validation, which was introduced in Gradient Boosting Machines.

4) Recalling a Naive Bayesian Classifier for Classification.

This Blog entry is from the Naive Bayesian section in Learn R.

To recall the pivotal classification, rather than recall P for each class and drive it from the larger of the values, the type class can be specified:

ClassPredictions <- predict(BayesianModel,CreditRisk,type = "class")
1.png

Run the line of script to console:

2.png

Merge the classification predictions into the CreditRisk data frame, specifying the dply library also:

library(dplyr)
CreditRisk <- mutate(CreditRisk, ClassPredictions)
3.png

Run the line of script to console:

4.png

Viewing the CreditRisk data frame:

View(CreditRisk)
5.png

Run the line of script to console:

6.png

Scroll to the last column in the RStudio viewer to reveal the classification for each record:

7.png

3) Recalling a Naive Bayesian Classifier for P.

This Blog entry is from the Naive Bayesian section in Learn R.

One of the benefits of using a Bayesian classifier is that it can return initiative probabilities which, ideally, should be fairly well calibrated to the actual environment.  For example, suppose that a 30% P of rain is produced by a weather station for 100 days, if it were to rain on 30 of those days, that would be considered to be a well calibrated model.  It follows that quite often it is not just the classification that is of interest, but the probability of a classification being accurate.

The familiar predict() function is available for use with the BayesModel object, the data frame to use in the recall and specifying a type to equal Raw,  instructing the function to return P and not the most likely classification:

PPredictions <- predict(BayesianModel,CreditRisk,type = "raw")
1.png

Run the line of script to console:

2.png

A peek of the data in the PPredictions output can be obtained via the head() function:

head(PPredictions)
3.png

Run the line of script to console:

4.png

Horizontally the P will sum to one, and evidences clearly the most dominant class. Anecdotally, the calibration of P in naive Bayesian models can be somewhat disappointing, while the overarching classification and be surprisingly accurate.

2) Training a Naive Bayesian Classifier.

This Blog entry is from the Naive Bayesian section in Learn R.

As a Naive Bayesian classifier is rather simple in its concept, all independent variables being treated and arcs flowing away from the dependent variable, it is to be expected that the process of training such a classifier is indeed trivial.  To train a Bayesian model, simply pass the data frame, specify the factor that is to be treated as the dependent variable and the Laplace estimator (zero in this example).  The naiveBayes() function exists as part of the e1071 package,  a such begin by installing the package via RStudio:

1.png

Click install to download and install this package:

2.png

Reference the library:

library(e1071)

3.png

Run the line of script to console. To train a Naïve Bayesian model:

BayesianModel <- naiveBayes(CreditRisk,CreditRisk$Dependent,laplace=0)
4.png

Run the line of script to console. The BayesModel object now contains a model that can be used to make P predictions as well as classifications. 

1) Converting Continuous Data to Categorical Data.

This Blog entry is from the Naive Bayesian section in Learn R.

Start by loading the CreditRisk dataset using the base read.csv() function,  to assure that strings are converted to factors.

CreditRisk <- read.csv("D:/Users/Trainer/Desktop/Bundle/Data/CreditRisk/German/CreditRisk.csv")
View(CreditRisk)
1.png

Run the block of script to console:

2.png

The View() function will load the dataset in the R Studio Viewer:

3.png

There are several vectors that are not appropriate for Bayesian analysis as they are continuous:

·         Requested_Amount.

·         Installment_Percentage_Of_Disposable_Income.

·         Present_Residency_Since.

·         Age.

·         Number_Of_Existing_Credits_At_This_Bank.

·         Dependent_Persons.

There are a variety of ways to convert the continuous values to categorical data, yet in this example we will focus on binning on a single vector, Age.  In this example, the Age will be broken into commonly used Age brackets:

·         18-24 Years old.

·         25-34 Years old.

·         35-44 Years old.

·         45-54 Years old.

·         55-64 Years old.

·         65-74 Years old.

·         75 Years or older.

It would be possible to use a series of logical statements to make the slice, or cut, between the values in this continuous series of data, but it would be quite cumbersome.  Fortunately there is a function that can simplify this for us, the cut() function.  The cut function takes a vector of data, and a vector of points to make the cut, returning a string denoting the range.  To make the cut based on the ranges described:

Age <- cut(CreditRisk$cut,c(18,24,34,44,54,64,74,999)
4.png

Run the line of script to console:

5.png

The head() command can used on Age to confirm that it is indeed a factor and that the levels have been apportioned:

head(Age)
6.png

Run the line of script to console:

7.png

Having created a factor for Age, it is necessary to overwrite the vector in the CreditRisk Data Frame.  This is a simple procedure of targeting the Age vector in the data frame as the target of assignment for the Age factor:

 CreditRisk$Age <- Age
8.png

Run the line of script to console:

9.png

Check that the assignment has indeed transformed the CreditRisk$Age to a factor peeking the head() function:

head(CreditRisk$Age)
10.png

Run the line of script to console:

11.png

It can be seen that the continuous variable has been transformed. 

Repeat for the remaining continuous variables, perhaps using the hist() function to identify appropriate thresholds, as the following example:

#Bin
Requested_Amount <- cut(CreditRisk$Requested_Amount,c(0,5000,10000,15000,20000))
Installment_Percentage_Of_Disposable_Income <- cut(CreditRisk$Installment_Percentage_Of_Disposable_Income,c(0,1,2,3,4,5,6,7,8,9,10,999))
Present_Residence_Since <- cut(CreditRisk$Present_Residence_Since,c(0,1,2,3,4,5,6,7,8,9,10,999))
Number_Of_Existing_Credits_At_This_Bank <- cut(CreditRisk$Number_Of_Existing_Credits_At_This_Bank,c(0,1,2,3,4,5,999))
Dependent_Persons <- cut(CreditRisk$Dependent_Persons,c(0,2,3,4,5,999))
Duration_In_Month <- cut(CreditRisk$Duration_In_Month,c(0,20,40,60,999))
#Allocate
CreditRisk$Requested_Amount <- Requested_Amount
CreditRisk$Installment_Percentage_Of_Disposable_Income <- Installment_Percentage_Of_Disposable_Income
CreditRisk$Present_Residence_Since <- Present_Residence_Since
CreditRisk$Number_Of_Existing_Credits_At_This_Bank <- Number_Of_Existing_Credits_At_This_Bank
CreditRisk$Dependent_Persons <- Dependent_Persons
CreditRisk$Duration_In_Month <- Duration_In_Month
12.png

Run the block of script to console:

13.png

It can be seen that from the data view pane in R studio, that for this data frame all components are now factors and so therefore appropriate for Bayesian Analysis:

14.png

This procedure requires a lot of typing and to facilitate smooth learning, a copy of the procedure has been saved in Bundle\R\Cut.r.