1) Configure and Train a Classification Exhaustive Model

This Blog entry is from the Exhaustive Basics section in Learn Exhaustive.

Once the Exhaustive application is loaded, the first step is to specify a csv file that is to be used for training.  This file is typically structured such that the dependent variable is the very first column in the file, with the independent variables trailing that column.  In this example, the FraudRisk.csv file will be used which is available as:



On inspection of this file in Excel it can be seen that the file is structured as aforementioned and as below:


In the Exhaustive application, on the first tab titled Model and in the Inputs section, draw attention to the textbox titled Data File.  This textbox is intended to accept the location of the csv file to be used in model training.  The simplest means to complete the Data File textbox is to click on the Search button to expand the directory search tool:


On clicking the Search button, the Directory and File browser will appear.  Use this dialog box to navigate to the file FraudRisk.csv:



Upon navigating to the FraudRisk.csv file, click Open to place the file location in the Data File textbox:


It can be seen that the File Headers have been used to populate several control boxes in the software.  Drawing attention to the Predict drop down, set this value to the Dependent Variable, which in the case is titled Dependent:


Fraud Risk is a classification problem, and as such, set the Classification radio button:


Exhaustive stores its training process in an SQL Server database under a training instance.  The training instance is allocated a GUID (a guaranteed unique value).  To create a GUID, click the New GUID button which will populate a fresh GUID in the GUID textbox:


For this classification problem, there are no prescription variables and no variables to fix.  The model is now ready to start training:


To start model training, click the Start Button towards the base of the tab.  The status bar towards the base of the tab will feedback the training progress, alongside line chart report detailing the best model score and number of models attempted:


The model will keep running ad infinitum, or until the maximum number of trials is exceeded as specified in the Settings tabs.  In this example, the best score achieved is 78, which would indicate that the average between Correlation and Percentage Correct is 78.