# 11) Learn TAN Structure to Link Nodes Automatically.

This Blog entry is from the Netica Basics section in Learn Netica.

When dealing with an overwhelming number of nodes it is possible to automatically link these nodes, based firstly on a naive structure similar to the that manually created, then augmenting this structure to look for relationships between the nodes that may be of interest.  This approach is called Tree Augmented Naïve Bayesian Networks, or TAN Bayesian Networks.

For Netica to learn the structure, start by selecting the dependent variable node in the canvas, in this case Default:

To learn the structure using Tree Augmented Naive approaches, click on the Cases menu item, click or hover on Learn, then click Learn TAN Structure:

Select the file to be used for the purposes of training the structure, in this example CreditRisk.csv:

Clicking OK will begin the learning process:

Firstly, all nodes will be linked to / from the dependent variable, thereafter relationships to / from independent variables will be established.

The direction of the link is not that important as Bayesian Inference will be performed, however if links do not follow the direction of causation, maintaining node \ conditional probability table can become bewildering.  It follows that a learnt TAN structure would likely be used only where probabilities are going to be learnt also. It follows that learning should occur to update the node probability tables, followed by determination of the classification accuracy of this network, so to determine if this extremely complex network provides any uplift on a simpler network.

# 10) Add Nodes Automatically to a Canvas.

This Blog entry is from the Netica Basics section in Learn Netica.

The process of manually adding nodes to a canvas is quite laborious and with a key benefit of Bayesian networks being the ability to handle extremely large networks with hundreds of nodes, impractical.  Furthermore, Bayesian techniques are inherently state based, which would rely on a process of dividing continuous variables into appropriate state bins.

Netica has the ability to infer columns from a file, thus allowing for automation in the creation of nodes on the canvas.

Start by creating a blank canvas:

To infer and then add the case file nodes, click Cases in the menu Item, then click or hover on the Learn sub menu item, then click Add Case File Nodes:

When the dialog box opens, select the file CreditRisk.csv:

Clicking Open will begin the process of creating nodes based upon the Variables name coupled with an analysis of the number of states within that Variable.  In the event that a variable is determined to be continuous, a prompt will be displayed to determine the number of states to set for this variable:

Specify the number of states deemed appropriate for the variable, then click ok.  Repeat for each variable until all of the nodes have been added to the canvas:

# 8) Test Classification Accuracy of a Bayesian Network.

This Blog entry is from the Netica Basics section in Learn Netica.

Bayesian Networks are viewed to be extremely useful for classification problems with the measure of the performance of being classification accuracy, commonly presented as a confusion matrix (in the same manner as Logistic Regression).

Bayesian networks, once constructed and trained, can facilitate a testing process which produces similar analysis to that observed in logistic regression Blog entries.

Firstly, highlight all nodes required by holding down the ctrl key and clicking the node name:

Bayesian networks, once constructed and trained, can facilitate a testing process which produces similar analysis to that observed in logistic regression Blog entries.

Firstly, highlight all nodes required by holding down the ctrl key and clicking the node name:

To test the network, navigate to the Cases menu, then click on the Test with Cases sub menu:

Select the CreditRisk.csv file when prompted to open a file:

Clicking the Open button begins the testing process, for the dependent variable, this is Default in this example, a Confusion Matrix and Error Rate is presented, being the main focus of optimisation in a stepwise approach, or perhaps using more automated means to add nodes to the canvas and establish relationships between the independent variables:

# 7) Learn node probabilities.

This Blog entry is from the Netica Basics section in Learn Netica.

Up to this point the Blog entries have created a naive Bayesian network based on belief, belief being an encapsulation of subjective probability in Node \ Conditional probability tables.

Subjective probability is extremely good when derived in a group and can allow for the creation of predictive analytics models where there is no data available (another tool for such scenarios is conjoined Regression \ ANN).  In the event that data is available, it is far better to train the structure with real probabilities based upon the contents of a data file.

The Blog entry to train a Bayesian network is quite simple. Start by resetting all findings, then clicking into the canvas to ensure that no node is selected:

It is very important that the name of the nodes match the names of the columns in the file that is intended to train the Bayesian Network and that all of the states that exist in the data, are reflected in the respective nodes.

To train the Bayesian Network, click on the menu item Cases, then click or hover on the Learn sub menu item, then click Incorporate Case File (Learn using EM achieves the same but is better where data is thought to be missing):

Locate the file to be used for training, in this case CreditRisk.csv:

Click open once the CreditRisk.csv file has been identified to begin the training process.  Remove pre-existing Node \ Conditional probability tables if prompted to do so:

Maintain the default degree of 1 when prompted:

The network has now been trained using actual probabilities identified in the data rather than those added subjectively:

An interesting exercise is to observe the difference between subjective and frequentist (i.e. learned) probabilities.

# 6) Netica Discretisation of Continuous Variable.

This Blog entry is from the Netica Basics section in Learn Netica.

Bayesian Methods should be considered as being incompatible with continuous variables as the premise of the analysis technique is that it apportions probability to states (akin to the sides of a dice).  Embracing the state only maxim of Bayesian Networks, presented with a continuous variable, the task is to convert that continuous variable into a state.

In the Blog entries thus far there have been several methods presented to bin variables for the purposes of model improvement.  Netica provides a quick and convenient means to turn continuous variables into states, a process it refers to as discretisation.

There are three useful automated forms of discretisation offered by Netics:

·         Fixed Bin

·         Exponential Bin

·         Natural Logarithm

The boundaries can be bound by -infinity or infinity if it is felt that the lower or upper bounds may change over time.

To enter the discretisation for a Node, right click on the node, then click properties:

It can be noted that the current node is set as Discrete, which means that States and their values are entered manually:

Click on the button Discrete which will present the opportunity to change the node to be Continuous:

Upon changing the node type to Continuous, click on the Description button which will expose a sub menu, then select Discretisation:

On clicking the Discretisation button, the large textbox will now accept (rather process) the shorthand notation that will divide a continuous variable into states:

Clearing out any existing values, shorthand will be used to specify the lower boundary, the upper boundary and the number of bins between these boundaries, in this example 0 is the lower boundary, 100 is the upper boundary and there are to be 5 bins:

``[0,100] / 5``

Upon clicking OK the node will be updated with these states.  If prompted to remove existing states, click OK:

This example uses a Fixed Bin shorthand.  There are three types of shorthand available, where the values in highlight are the parameters:

·         Fixed Bin (as example): [Begin,End] / Bin

·         Exponential Bin: [Begin, End] +%Bigger

·         Natural Logarithm: [Begin, End] / L Bin

If the production values of the upper and lower bound are not known at design time, then -infinity or infinity can be used as lower and upper bound respectively.  The use of infinity will bring about runtime resizing of the bounds.

# 5) Manually setting node states to predict and explain.

This Blog entry is from the Netica Basics section in Learn Netica.

To make a prediction, which in is in reality a simple matter of recalling the states from the Node \ Conditional Probability tables that were manually entered, it is a simply matter of hovering over the node and state to set, then clicking to set that node:

In this example the prediction of whether an account will default is based on the customer having more than three credit products, rather Count_Bank_Credit_Products_3, Yes:

A lookup from the Node \ Conditional probability takes place, in effect, predicting the probability of default to be 9% based on this finding.  Forward wise this is an unremarkable prediction based entirely on belief, however Bayesian can perform inference for the purposes of providing explanatory value for the most probable environment surrounding a customer defaulting.

Reset all case findings by clicking the Icon of the same name in the menu:

In this example, click on the Yes state of the Default Node, to update the causation nodes to using Bayesian inference, so to provide some explanatory value as to the environment that causes a customer to default:

In this example it can be observed that a customer is in all probability going to have more than three credit products, if they default.

# 4) Enter subjective probabilities for each consequence.

This Blog entry is from the Netica Basics section in Learn Netica.

In creating a consequence, with many potential causes, and with the causes being state based (which in this example is Yes \ No), a finite set of scenarios that cause a consequence can now be inferred by Netica.

To view the finite scenarios that can cause a consequence, right click on the consequence node, in this case Default, the click Table (short for Node \ Conditional Probability Table):

The node probability infers every possible scenario in the Bayesian Network, calling for subjective probabilities to be included:

In this simple example, there are two scenarios which require subjective probabilities, however, with more nodes this GREATLY expands.  Subjective probability needs to be apportioned to each scenario, rather belief (hence Bayesian Belief Networks).

In this example apportion the following subjective probability:

·         If Count Bank Card Products > 3 then P(Default) = 9%

·         If NOT Count Bank Card Products > 3 then P(Default) = 3%

These probabilities would be updated in the corresponding table:

Clicking on the Fill Missing Probabilities Icon will complete the missing probabilities where possible, summing to 100%:

Click Apply, then Ok to close the window. The node probabilities have been set, however the network has not been compiled, and so the states retain the default probabilities:

To compile the network, click on the lightning bolt icon in the menu to compile the network and set the probabilities:

The Bayesian network has now been compiled and is ready to both predict Default and explain Default via Bayesian Inference.

# 3) Link Variables as causes consequence.

This Blog entry is from the Netica Basics section in Learn Netica.

One method of creating Bayesian Networks is to judge an Independent Variable to cause a consequence to another, most likely dependent, variable.

To reflect that one variable can cause a consequence for another variable, in this example Count_Bank_Credit_Products_3 having consequence for Default, a link is drawn between the variables.  Links always flow in the direction of causation.

After the link icon is toggled, click in the centre of the node that is causing a consequence, in this case Count_Bank_Credit_Products_3 then drag:

Drag the link to the centre of the node which suffers consequence, in this case Default, then drop to consummate the link:

Following the causes consequence paradigm makes the construction of node probability tables more intuitive (as the tables will be built at the consequence inferring all possible scenarios).  Repeat the links for every node that causes a default consequence on the canvas.

This approach constructs what is known as a naive Bayesian Network, in that all nodes evenly cause a single consequence in structure.

# 2) Set States attributed to the Dependent and Independent Variables.

This Blog entry is from the Netica Basics section in Learn Netica.

For both of the nodes stamped to the canvas, representing a single dependent variable or a single independent variable, there is the same states of Yes \ No (i.e. both nodes only have two possible, string based outcomes). It follows that each of the nodes needs to have the Yes \ No states set.

To set the states of a node, right click on the node and select properties, in this case right click on the Default node (the dependent variable):

The properties window will open which is the same windows used to name the node.  Focusing attention towards the centre of the window, there is an entry box titled State:

Type the name of the first state, which would be Yes:

Then click New to commit the Yes state, proceeding to create the No state:

Click OK to commit both states to the node, after which the Node will be updated to reflect both states with an even probability:

Repeat the process for each node on the canvas, for each possible state for that node:

# 1) Create a New Canvas, add a Dependent Variable and an Independent Variable.

This Blog entry is from the Netica Basics section in Learn Netica.

Like Decision Trees, Netica is quite visual.  Independent and Dependent variables are stamped to a canvas and joined together in the direction of causation, creating a network.  The starting point for creating a Bayesian Network is to create a new canvas.

Creating a new canvas is achieved from the File menu, by clicking File….New….Network:

Creating a new canvas can also be achieved by clicking the icon as follows:

A new canvas will appear:

Variables, hitherto nodes, are stamped to the canvas with one node for each variable to be included in the model.  In this example there will be a single node representing the dependent variable and a single node representing the independent variable.

Right click on the canvas and expand the Modify Menu by right clicking, then clicking New Node, then clicking Nature Node Discrete:

A node will be stamped to the canvas in the location of the right click with the nodes properties box being shown by default:

Name the variable to EXACTLY the same as the dependent variable is named in the dataset, in this example Default, then click OK:

Repeat the process adding a second node to the canvas, this time naming the node as an independent variable with yes \ no states, in this example Count_Bank_Credit_Products_Greater_3:

Notice that there are now two nodes stamped to the canvas, one for the dependent variable and one for the independent variable.  Notice also that while each of these nodes has two possible values, referred to as states, the nodes only reflect one default state.

# Introduction to Netica Basics

This Blog entry is from the Netica Basics section in Learn Netica.

Norsys Netica is a modelling tool that allows for the creative development of Bayesian networks based on either belief (this would be subjective probability) or data (taking a frequentest approach to probability as available in data).

The software does not install nativity to the operating system, the executable are in the directory:

\Training\Software\Netica 521

Execute the program Netica.exe, which will open the Netica user interface:

The data file that will be used in these procedures is available in Training\Data\CreditRisk and is titled CreditRisk.csv:

The CreditRisk.csv file is extremely large containing an uneven number of default vs. good cases, it could be said that this is a representative sample unlike the logistic regression techniques with rely on an even number of cases in both dispositions.