3) Recall an Exhaustive Model

This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.

It has been observed that the a GUID is specified at the point the model is trained.  This GUID is used to produce reports on the training process as well as facilitate model recall via batch file or API.

The GUID that will be used for this example is as follows, being the FraudRisk.csv model training outcome:

22949565-0adf-42e6-af7c-e6a787d1a062

To view the winning model for this GUID, start by clicking on the Optimisation tab in Exhaustive:

1.png

Place the GUID in the GUID textbox:

2.png

Navigate to the base of the tab and click the Fetch button, which will now be available:

3.png

Upon clicking the fetch button, the model evolution will be returned in the upper grid, with the selected variables being returned in the lower grid:

4.png

The lower grid, detailing the variable selection, will include statistics and rankings:

·         The statistics for each variable calculated before training.

·         The statistics derived from Monte Carlo simulation detailing the summary statistics, for each variable, only for the simulations where the score exceeds a given threshold specified in the settings tabs.

·         Sensitivity metrics including a ranking and score detailing the most sensitive variable to the least sensitive variable.

The statistics will be produced for the best performing model only.  A key requirement is to recall the model against an excel spreadsheet of csv file, so that the model can be used in the day to day operations.  Recall can take place by uploading a file, but also via an API (please see Formats document).  This example will explore the invocation of the model via file.

To process a file of data through a model, navigate to the Production tab in the Exhaustive Application:

5.png

The Production tab takes two parameters.  The first parameter is the file that contains records to be processed through the model, being in the same formal as the training dataset albeit without a dependent variable (usually).  The second parameter is the GUID of the model to be recalled for each record I the dataset.

Start by clicking the Search button to facilitate the population of the Data File text box with the target file:

6.png

Select the file in the Directory File Explorer Dialog Box, which in this case will be the same file as used for training:

Bundle\Data\FraudRisk\FraudRisk.csv

7.png

Once the file is selected, pair the GUID by entering it in the GUID textbox as follows:

22949565-0adf-42e6-af7c-e6a787d1a062

8.png

If there are prescriptive variables declared for this model, then it will be fluttered randomly in a triangular distribution as identified from the training dataset during the training process, the Simulations textbox is the number of random simulations to perform.  The largest score value will be retained as the optimal and returned to the record as a prescription.  In this case, no prescription is required, hence the value is set to zero:

9.png

Two columns will be appended to the dataset provided, or a copy of that dataset at least.  The first column will be the score returned by the model with the second being a flag which is intended to determine if the record is classified in one direction or another (i.e. 1 or 0).  Classification models return as a probability, between 0 and 1, hence values greater than 0.5 would suggest that the record is more likely classified than not:

10.png

Upon selecting the values for model recall, click the Start button at the base of the Production tab:

11.png

Upon clicking the start button the file will be loaded with each record being processed through the model and returning a score.  The status of processing will be written out to a status bar during processing.  Upon completion of processing, a histogram of the scores achieved will be created:

12.png

A file will be created in the same directory as the original dataset, copied and appended with the score and an activation flag:

13.png
14.png

In the event that a prescription variable has been specified, this value will be updated for each record.