2) Output Logistic Regression Model as Probability and set Threshold Function.

This Blog entry is from the Linear Regression section in Learn Palisade.

Logistic regression output is constructed in exactly the same manner as in Linear Regression, in so far as there is a constant representing the starting point, thereafter the addition of the independent variable values multiplied by a coefficient, in this example:

= -0.899434346 + (High_Risk_Country * 2.926138785)

1.png

Fill down and name the column Model_1:

2.png

The output ranges from -5 to +5, however, it is not linear, rather it is a logarithm:

3.png

The output is substantially more intuitive if converted to a probability which ranges from 0 to 100 (or 0 to 1 if being represented as a pure probability), the formula to convert the current output to a probability is:

P = exp(Ouput) / (1+exp(Ouput))

The formula above uses the exp function in excel.  As if creating a model output, select the last cell in the spreadsheet, in this example AW2:

4.png

The function will reference the output of the first model, which in our example is cell AV2 start entering the formula:

=exp(

Select cell AV2 as the model output in its raw state:

5.png

Then complete the formula referencing the output in the same manner:

P = exp(AV2) / (1+exp(AV2))

6.png

Fill down and label the column Model_1_P:

7.png

Unlike the Linear Regression models which simply give an output of a numeric value for use, classification models creating a score, probability or otherwise, rely on the setting of a threshold as an activation function to declare, in this example, fraud.  In our example, the threshold is 80% probability of fraud for the prediction to be considered as such.  Once again, the IF function will be brought to bear for the purposes of creating an activation function.

Select the last cell in the spreadsheet, in this example AX2, and begin an IF function referencing the Model_1_P value in cell AW2:

=IF(AW2>

8.png

A probability is expressed between 0 and 1, therefore .8 would represent 80% likelihood.  It follows that the threshold value would be .8, which would complete the IF function:

=IF(AW2>0.8

9.png

Enter the remaining parameter that will be returned try as 1, then the false return value as 0:

=IF(AW2>0.8,1,0

10.png

Complete the formula by closing the parentheses, fill down and name the column Model_1_Is_Fraud.  Accordingly, any example with a value of one, would be considered activated:

11.png

1) Forward Stepwise Logistic Regression

This Blog entry is from the Linear Regression section in Learn Palisade.

The Blog entry to create a Logistic Regression model is almost identical to that of creating a Linear Regression model, in that default options suffice while the concepts of Dependent and Independent variables are used in for the purposes of creating the model over the X and Y specifications that had previously been used in other analysis.

Logistic Regression is available by clicking the Regression and Classification menu on the StatTools ribbon, then clicking Logistic Regression on the sub menu:

1.png

The logistic regression window will open:

2.png

The concept of stepwise Logistic Regression exists in the same manner as it does in Linear Regression and although not explicitly mentioned, this Blog entry assumes that correlation analysis has been performed on all variables and the variable with the strongest correlation is carried forward as the starting independent variable, in this case High_Risk_Country (a pivoted categorical variable):

3.png

The dependent variable in this dataset is titled Dependent and represents the transaction being fraudulent or not:

4.png
5.png

While it is the default option, it is important to select ‘Include Classification Summary’ option as this is an important performance measure for stepwise Logistic Regression.

Clicking OK will produce the Logistic Regression output:

6.png

Stepwise Linear Regression has now become familar, for which the same concepts exist with Logistic Regression.  The performance measures in Logistic Regression differ from that of Linear Regression; P-Values need to be optimised in the same way and should never ideally exceed 5%, while further optimisation values relate to the classification accuracy of the logistic regression model, for which performance should always be sought:

7.png

It follows that the Logistic Regression model should be improved by adding the next strongest correlating variable seeking improvement in the classification accuracy while maintaining good P-Values.

Introduction to Logistic Regression

This Blog entry is from the Linear Regression section in Learn Palisade.

Logistic regression is extremely similar to Linear Regression in the manner in which the output is presented as Constants and Independent Variable coefficients, except it’s role is to classify instead of forecast numeric values, this is to say that it is looking to predict the likelihood of a binary dependent variable outcome rather than that of a continuous dependent variable.

The file to be used in this example is contained in \Training\Data\FraudRisk and is titled FraudRisk.xslx.  Logistic Regression is a feature of StatTools and thus the starting point is to open the file, although do not at this stage load the file into the StatTools realm as a dataset.