This Blog entry is from the Linear Regression section in Learn Palisade.
Correlations, and strength, is used predominantly to find the strongest correlating independent variables for the purposes of carrying that variable through to predictive analytics. The variables with the strongest correlations, in one direction or another, are carried forward to predictive analytics techniques in subsequent Blog entries.
Firstly, select the Dependents variable and select all of the independent variables correlating, to whatever degree, to that Independent Variable. In this example it would involve selecting the first two columns in the correlation matrix, where Dependent is across the top, and the independent variables are along the left-hand side, and of course the values to the right of the names. Once selected, right click on this selection, then copy:
Create a new worksheet by clicking the tab with a plus sign at the base of the workbook currently in focus:
Once the new workbook is in focus, select cell A1, right click and paste the values:
A more succinct comparison between the Dependent and Independent variables is presented, although it is not ordered by the strength of correlation:
Both -1 and +1 are equally interesting for the purposes of variable selection, and so, we cannot simply order by the correlation without first levelling it to an absolute – this is positive – value. Start by creating a new column called ABS, in this example select cell C1, setting the value to ‘ABS Value’, although the naming is quite unimportant:
For each of the Independent Variables, apply the Excel ABS Function to the value. In this example, the first cell of interest is in C3, being the Mean. In cell C3, start by typing the formula:
Select the Independent Variable correlation value, which is in cell B3 in this example:
Close the parenthesis to complete the formula, then fill down:
At this stage, as the Independent Variable correlations have been set to an absolute value, only values close to +1 are deemed interesting. It is helpful to order the absolute correlations from high to low, where the strongest correlations are at the top, the weakest at the bottom.
Click cell A1 to reset the selection, if anything is selected, then navigate to the data ribbon, select the icon Sort towards the middle of the ribbon, which will infer the dataset:
Select the parameters Sort By ‘ABS Value’ then the order ‘Largest to Smallest’:
On clicking Ok, the values are ordered based on their strength of correlation:
It is extremely likely that those correlations towards the top of the table will be included in predictive modelling, while those correlations towards the bottom of the table will be disregarded.
It is not, at this stage at least, advisable to make a determination of what variables should or should not be in a predictive analytics model, instead relying on stepwise approaches as covered in subsequent Blog entries.