This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.
It has been observed that the a GUID is specified at the point the model is trained. This GUID is used to produce reports on the training process as well as facilitate model recall via batch file or API.
The GUID that will be used for this example is as follows, being the FraudRisk.csv model training outcome:
To view the winning model for this GUID, start by clicking on the Optimisation tab in Exhaustive:
Place the GUID in the GUID textbox:
Navigate to the base of the tab and click the Fetch button, which will now be available:
Upon clicking the fetch button, the model evolution will be returned in the upper grid, with the selected variables being returned in the lower grid:
The lower grid, detailing the variable selection, will include statistics and rankings:
· The statistics for each variable calculated before training.
· The statistics derived from Monte Carlo simulation detailing the summary statistics, for each variable, only for the simulations where the score exceeds a given threshold specified in the settings tabs.
· Sensitivity metrics including a ranking and score detailing the most sensitive variable to the least sensitive variable.
The statistics will be produced for the best performing model only. A key requirement is to recall the model against an excel spreadsheet of csv file, so that the model can be used in the day to day operations. Recall can take place by uploading a file, but also via an API (please see Formats document). This example will explore the invocation of the model via file.
To process a file of data through a model, navigate to the Production tab in the Exhaustive Application:
The Production tab takes two parameters. The first parameter is the file that contains records to be processed through the model, being in the same formal as the training dataset albeit without a dependent variable (usually). The second parameter is the GUID of the model to be recalled for each record I the dataset.
Start by clicking the Search button to facilitate the population of the Data File text box with the target file:
Select the file in the Directory File Explorer Dialog Box, which in this case will be the same file as used for training:
Once the file is selected, pair the GUID by entering it in the GUID textbox as follows:
If there are prescriptive variables declared for this model, then it will be fluttered randomly in a triangular distribution as identified from the training dataset during the training process, the Simulations textbox is the number of random simulations to perform. The largest score value will be retained as the optimal and returned to the record as a prescription. In this case, no prescription is required, hence the value is set to zero:
Two columns will be appended to the dataset provided, or a copy of that dataset at least. The first column will be the score returned by the model with the second being a flag which is intended to determine if the record is classified in one direction or another (i.e. 1 or 0). Classification models return as a probability, between 0 and 1, hence values greater than 0.5 would suggest that the record is more likely classified than not:
Upon selecting the values for model recall, click the Start button at the base of the Production tab:
Upon clicking the start button the file will be loaded with each record being processed through the model and returning a score. The status of processing will be written out to a status bar during processing. Upon completion of processing, a histogram of the scores achieved will be created:
A file will be created in the same directory as the original dataset, copied and appended with the score and an activation flag:
In the event that a prescription variable has been specified, this value will be updated for each record.