3) Recall an Exhaustive Model

This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.

It has been observed that the a GUID is specified at the point the model is trained.  This GUID is used to produce reports on the training process as well as facilitate model recall via batch file or API.

The GUID that will be used for this example is as follows, being the FraudRisk.csv model training outcome:

22949565-0adf-42e6-af7c-e6a787d1a062

To view the winning model for this GUID, start by clicking on the Optimisation tab in Exhaustive:

1.png

Place the GUID in the GUID textbox:

2.png

Navigate to the base of the tab and click the Fetch button, which will now be available:

3.png

Upon clicking the fetch button, the model evolution will be returned in the upper grid, with the selected variables being returned in the lower grid:

4.png

The lower grid, detailing the variable selection, will include statistics and rankings:

·         The statistics for each variable calculated before training.

·         The statistics derived from Monte Carlo simulation detailing the summary statistics, for each variable, only for the simulations where the score exceeds a given threshold specified in the settings tabs.

·         Sensitivity metrics including a ranking and score detailing the most sensitive variable to the least sensitive variable.

The statistics will be produced for the best performing model only.  A key requirement is to recall the model against an excel spreadsheet of csv file, so that the model can be used in the day to day operations.  Recall can take place by uploading a file, but also via an API (please see Formats document).  This example will explore the invocation of the model via file.

To process a file of data through a model, navigate to the Production tab in the Exhaustive Application:

5.png

The Production tab takes two parameters.  The first parameter is the file that contains records to be processed through the model, being in the same formal as the training dataset albeit without a dependent variable (usually).  The second parameter is the GUID of the model to be recalled for each record I the dataset.

Start by clicking the Search button to facilitate the population of the Data File text box with the target file:

6.png

Select the file in the Directory File Explorer Dialog Box, which in this case will be the same file as used for training:

Bundle\Data\FraudRisk\FraudRisk.csv

7.png

Once the file is selected, pair the GUID by entering it in the GUID textbox as follows:

22949565-0adf-42e6-af7c-e6a787d1a062

8.png

If there are prescriptive variables declared for this model, then it will be fluttered randomly in a triangular distribution as identified from the training dataset during the training process, the Simulations textbox is the number of random simulations to perform.  The largest score value will be retained as the optimal and returned to the record as a prescription.  In this case, no prescription is required, hence the value is set to zero:

9.png

Two columns will be appended to the dataset provided, or a copy of that dataset at least.  The first column will be the score returned by the model with the second being a flag which is intended to determine if the record is classified in one direction or another (i.e. 1 or 0).  Classification models return as a probability, between 0 and 1, hence values greater than 0.5 would suggest that the record is more likely classified than not:

10.png

Upon selecting the values for model recall, click the Start button at the base of the Production tab:

11.png

Upon clicking the start button the file will be loaded with each record being processed through the model and returning a score.  The status of processing will be written out to a status bar during processing.  Upon completion of processing, a histogram of the scores achieved will be created:

12.png

A file will be created in the same directory as the original dataset, copied and appended with the score and an activation flag:

13.png
14.png

In the event that a prescription variable has been specified, this value will be updated for each record.

2) Configure and Train a Prescriptive Exhaustive Model

This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.

One of the interesting and unique features in Exhaustive is the ability for models to be recalled where certain variables are randomised to observe the effect it has on the score at recall.  Fluttering certain variables in this way can facilitate experimentation in real-time to prescribe an optimal solution to a problem.

Creating a prescription model is exactly the same as creating other models in Exhaustive, with the additional step being the specification of variables that are to be used as prescription variables.

In this Blog entry, repeat the steps as detailed in the previous Blog entry, with the following file but stop short at clicking the Start button:

\Bundle\Data\AdTech\AdTech.csv

1.png

This is structure in the same manner as the FraudRisk.csv file, although there is a field called Response Elevation (i.e. bid) for which optimisation is sought.  Specifying the variable as being Prescriptive instructs exhaustive to simulate the variable on model recall, rather than rely on what has been passed (if indeed such a value exists at the time of recall):

2.png

In this example, as it is thought that geography plays an important part in AdTech, fix the Latitude and Longitude fields such that these variables will be in an Exhaustive trial as a minimum:

3.png

Click on the Start button to begin the training:

4.png

1) Configure and Train a Classification Exhaustive Model

This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.

Once the Exhaustive application is loaded, the first step is to specify a csv file that is to be used for training.  This file is typically structured such that the dependent variable is the very first column in the file, with the independent variables trailing that column.  In this example, the FraudRisk.csv file will be used which is available as:

Bundle\Data\FraudRisk\FraudRisk.csv

1.png

On inspection of this file in Excel it can be seen that the file is structured as aforementioned and as below:

2.png

In the Exhaustive application, on the first tab titled Model and in the Inputs section, draw attention to the textbox titled Data File.  This textbox is intended to accept the location of the csv file to be used in model training.  The simplest means to complete the Data File textbox is to click on the Search button to expand the directory search tool:

3.png

On clicking the Search button, the Directory and File browser will appear.  Use this dialog box to navigate to the file FraudRisk.csv:

Bundle\Data\FraudRisk\FraudRisk.csv

4.png

Upon navigating to the FraudRisk.csv file, click Open to place the file location in the Data File textbox:

5.png

It can be seen that the File Headers have been used to populate several control boxes in the software.  Drawing attention to the Predict drop down, set this value to the Dependent Variable, which in the case is titled Dependent:

6.png

Fraud Risk is a classification problem, and as such, set the Classification radio button:

7.png

Exhaustive stores its training process in an SQL Server database under a training instance.  The training instance is allocated a GUID (a guaranteed unique value).  To create a GUID, click the New GUID button which will populate a fresh GUID in the GUID textbox:

8.png

For this classification problem, there are no prescription variables and no variables to fix.  The model is now ready to start training:

9.png

To start model training, click the Start Button towards the base of the tab.  The status bar towards the base of the tab will feedback the training progress, alongside line chart report detailing the best model score and number of models attempted:

10.png

The model will keep running ad infinitum, or until the maximum number of trials is exceeded as specified in the Settings tabs.  In this example, the best score achieved is 78, which would indicate that the average between Correlation and Percentage Correct is 78.

Introduction to Exhaustive Basics

This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.

Exhaustive is software that automates the search for Regression (Linear or Logistic) and Neural Networks Topology (Levenberg Marquart Learning).  The software gains it name from the manner in which it will randomly trials topologies to arrive at an optimal, and tidy, model.

This module will focus on using Exhaustive for classification and will use the FraudRisk.csv AdTech.csv dataset.

These procedures assume that Exhaustive is already installed, however if this is not the case, the installation guide to install Exhaustive is available in the following location:

https://ui.jube.io/Help/Index.htm

Firstly, execute the Exhaustive program – which is a thick client application – by navigating to the directory:

Bundle\Exhaustive\

1.png

Execute the application titled JubeCapitalHorizontalAbstraction.exe:

2.png

The Exhaustive thick client application will be loaded and available for use.  The default parameters will be used throughout this training guide.