4) Fitting a distribution

This Blog entry is from the Monte Carlo Model section in Learn Palisade.

A triangular distribution is extremely useful for modelling a variety of business scenarios, especially if created as part of elicitation and in the absence of data. 

If data does exist though, it would be more useful to set a distribution reflective of that distribution. 

@Risk can handle a very wide range of distribution and has a means to perform statistical tests to see which distribution sees to reflect the data most closely.

Start by clicking opening a dataset for which you wish to perform fitting.  In this example the underlying dataset used to create the Regression Model will be used, the GBPUSD with 5M bars available at \Data\FX\GBPUSD with file name GBPUSD_Abstraction_Example_Clean.csv:

1.png

Start by highlighting the column to fit:

J:J

Click the button Distribution Fitting in the Palisade @Risk Ribbon:

Then on the sub menu click Fit to fitting options:

4.png

Retaining the default options click the Fit button:

5.png

The tool will try and fit the selected data against a series of distributions, before selecting a distribution with the best fit:

6.png

In this example the distribution appears to be relatively normal in distribution about the average, it has determined that this is a Weibull distribution.  Scores presented for several other distributions towards the left hand side of the window, it may be appropriate to select one on a more subjective or intuitive basis:

7.png

In this example, the most fitting will be written out as an @Risk function which can be natively used in future simulations. 

Click write to cell to review the distribution formula:

8.png

It is quite common that the dataset being used is not the same dataset that the @Risk model has been deployed to.  It follows that it is often easier to copy \ make a note of the formula for continued and future use.  As such, highlight the formula in the text box:

9.png

=RiskWeibull(6.8331,0.049509,RiskShift(-0.045066),RiskName("Mean"))

An alternative approach is to click Next, where a target cell can be specified:

10.png

Selecting a cell to write the formula is a less flexible approach in practical use and thus the former method is recommended, keeping a journal of fitted distributions for future reference.

3) Performing Optimisation and Mitigation on Monte Carlo Simulation Data.

This Blog entry is from the Monte Carlo Model section in Learn Palisade.

To Optimise and Mitigate we will need to review the Simulation Data.  Click on the Excel Reports button, then select Simulation Data (Combined):

1.png

On clicking OK a new spreadsheet will be created which will show each simulation:

Owing to this spreadsheet showing the inputs and the output for each simulation we are able to filter to identify optimal and \ or troublesome scenarios.  In this example the filter will be for all records where the forecast is less than -0.7, being a high move:

3.png

Using Summary Statistics we are now able to identify the scenario that would cause a high move which can be compared with other scenarios. 

In the below example we can infer that the average skew for an optimal trade would be -23 vs the more typical value of -8 which would include losing trades.

4.png
5.png

It follows that in terms of optimisation we may determine to focus on trades where the skew is more pronounced.

2) Running Monte Carlo Simulations.

This Blog entry is from the Monte Carlo Model section in Learn Palisade.

Set the simulation iterations to a suitably large number, for example 100,000:

1.png

Click Start Simulation to begin the simulation.  The probability density function will start to take shape:

2.png

In the above example we can determine that in all probability the market is going to move downward, in fact 90% likely to move downward between 0 and -0.7. 

The market is extremely unlikely to go up based on the last 8000 5M observations (although this could just as well indicate a correction). 

The shape of the distribution and the lack of normality would be cause for concern and lead us to reconsider promoting this model in all likelihood revisiting the validity of the model.  It is fine for the purposes of our example albeit with the caveat that the value of the optimisation and mitigation process is closely linked to the predictive performance of the model in the first instance.

1) Configure a Logistic Regression Model and Define Triangular Distribution.

This Blog entry is from the Monte Carlo Model section in Learn Palisade.

In this guide we have implemented a Linear Regression Model in an Excel Spreadsheet across four strongly correlating variables from the training dataset.  We are using intraday GBPUSD with 5M candlesticks whereby we have a chart configured with 700 of the most recent candlesticks. 

The model is comprised of the following independent variables, coefficient and intercept:

Name

Value

Intercept

-0.000810676

_4_Mode

-0.000200686

_4_Median

0.03422778

_4_Skew

-0.000148064

_4_HighPrice

-0.070443585

1.png

The triangular distributions are determined as follows and as based on the data:

Field

Max

Min

Most Likely

_4_Mode

0.00191

-0.03038

-0.010

_4_Median

0.023948

-0.01913

-0.00078

_4_Skew

1.542245

-26.2921

-0.10329

_4_HighPrice

0.00191

-0.03038

-0.01086

Although we are using an explainable model, the function could just as easily be a call to a Neural Network DLL (although in example of Jube Capital optimisation software this is performed automatically upon Neural Network retraining as shown in the conclusion):

2.png

Launch @Risk and click on the value cells:

3.png

Click on Define Assumption, then click Triang when the dialog opens:

4.png

Click Select Distribution to be presented with the configuration.  Using the values as described in the above table, enter the Min, Max and Most Likely for each variable:

5.png

Click ok to place the distribution assumption and then repeat for each variable:

6.png

Click on the formula output cell in yellow and click on the Add Output button:

7.png

Specify an appropriate name or retain the default, then click ok.

2) Output Logistic Regression Model as Probability and set Threshold Function.

This Blog entry is from the Linear Regression section in Learn Palisade.

Logistic regression output is constructed in exactly the same manner as in Linear Regression, in so far as there is a constant representing the starting point, thereafter the addition of the independent variable values multiplied by a coefficient, in this example:

= -0.899434346 + (High_Risk_Country * 2.926138785)

1.png

Fill down and name the column Model_1:

2.png

The output ranges from -5 to +5, however, it is not linear, rather it is a logarithm:

3.png

The output is substantially more intuitive if converted to a probability which ranges from 0 to 100 (or 0 to 1 if being represented as a pure probability), the formula to convert the current output to a probability is:

P = exp(Ouput) / (1+exp(Ouput))

The formula above uses the exp function in excel.  As if creating a model output, select the last cell in the spreadsheet, in this example AW2:

4.png

The function will reference the output of the first model, which in our example is cell AV2 start entering the formula:

=exp(

Select cell AV2 as the model output in its raw state:

5.png

Then complete the formula referencing the output in the same manner:

P = exp(AV2) / (1+exp(AV2))

6.png

Fill down and label the column Model_1_P:

7.png

Unlike the Linear Regression models which simply give an output of a numeric value for use, classification models creating a score, probability or otherwise, rely on the setting of a threshold as an activation function to declare, in this example, fraud.  In our example, the threshold is 80% probability of fraud for the prediction to be considered as such.  Once again, the IF function will be brought to bear for the purposes of creating an activation function.

Select the last cell in the spreadsheet, in this example AX2, and begin an IF function referencing the Model_1_P value in cell AW2:

=IF(AW2>

8.png

A probability is expressed between 0 and 1, therefore .8 would represent 80% likelihood.  It follows that the threshold value would be .8, which would complete the IF function:

=IF(AW2>0.8

9.png

Enter the remaining parameter that will be returned try as 1, then the false return value as 0:

=IF(AW2>0.8,1,0

10.png

Complete the formula by closing the parentheses, fill down and name the column Model_1_Is_Fraud.  Accordingly, any example with a value of one, would be considered activated:

11.png

1) Forward Stepwise Logistic Regression

This Blog entry is from the Linear Regression section in Learn Palisade.

The Blog entry to create a Logistic Regression model is almost identical to that of creating a Linear Regression model, in that default options suffice while the concepts of Dependent and Independent variables are used in for the purposes of creating the model over the X and Y specifications that had previously been used in other analysis.

Logistic Regression is available by clicking the Regression and Classification menu on the StatTools ribbon, then clicking Logistic Regression on the sub menu:

1.png

The logistic regression window will open:

2.png

The concept of stepwise Logistic Regression exists in the same manner as it does in Linear Regression and although not explicitly mentioned, this Blog entry assumes that correlation analysis has been performed on all variables and the variable with the strongest correlation is carried forward as the starting independent variable, in this case High_Risk_Country (a pivoted categorical variable):

3.png

The dependent variable in this dataset is titled Dependent and represents the transaction being fraudulent or not:

4.png
5.png

While it is the default option, it is important to select ‘Include Classification Summary’ option as this is an important performance measure for stepwise Logistic Regression.

Clicking OK will produce the Logistic Regression output:

6.png

Stepwise Linear Regression has now become familar, for which the same concepts exist with Logistic Regression.  The performance measures in Logistic Regression differ from that of Linear Regression; P-Values need to be optimised in the same way and should never ideally exceed 5%, while further optimisation values relate to the classification accuracy of the logistic regression model, for which performance should always be sought:

7.png

It follows that the Logistic Regression model should be improved by adding the next strongest correlating variable seeking improvement in the classification accuracy while maintaining good P-Values.

Introduction to Monte Carlo Model

This Blog entry is from the Monte Carlo Model section in Learn Palisade.

Monte Carlo Simulation is a technique to create many random simulations based upon a random case (i.e. a transaction).   The random value can be forced to obey certain statistical assumptions, which in this example will be a triangular distribution.  Monte Carlo simulation is an enormous topic in its own right yet these procedures are intended to give just a basic overview of the tool and allow for the simulation of models created in these procedures.

Simulation for Communication refers to being able to run models based on explainable statically assumptions so to facilitate expectation setting for the models impact.  Furthermore, that millions of random simulations will be exposed to the model, where records of both the randomly generated record and the output are retained, Monte Carlo simulation can help identify scenarios where there is potential for optimisation or risk mitigation.

Introduction to Logistic Regression

This Blog entry is from the Linear Regression section in Learn Palisade.

Logistic regression is extremely similar to Linear Regression in the manner in which the output is presented as Constants and Independent Variable coefficients, except it’s role is to classify instead of forecast numeric values, this is to say that it is looking to predict the likelihood of a binary dependent variable outcome rather than that of a continuous dependent variable.

The file to be used in this example is contained in \Training\Data\FraudRisk and is titled FraudRisk.xslx.  Logistic Regression is a feature of StatTools and thus the starting point is to open the file, although do not at this stage load the file into the StatTools realm as a dataset.