Stepwise Multiple Regression

In the preceding chapter we discussed advantages of the multiple regression analysis with not correlated, orthogonal, predictor variables. However, orthogonal predictor variables, with few exceptions, do not occur in analyses of real data. In this chapter we are going to discuss one of the procedures how to make the predictor variables orthogonal prior to the multiple regression analysis.

Orthogonalization of Two Predictors by Successive Partialling

In the previous section we have discussed solution to an idea of successively interchanging variables between the predictor and criterion sets. A similar idea is that of the orthogonalization of predictor variables by the successive partialling. This idea is based on the realization that the correlation between the predicted and error, residual component of the regression analysis is zero. Thus, why not precede the regression analysis proper by a series of preliminary regression analyses, decomposing the set of the predictor variables, one by one, to their predicted and residual components, discarding the predictable components, and keeping only the residual components? This is the reasoning behind the stepwise multiple regression analysis. For the current example, let's designate the second variable in the predictor set X as the criterion variable and split the criterion into its predictable and residual components, as

 

 

 

 

The correlation between  and  is 1.00. The  and  are perfectly related and thus the  is from the viewpoint of correlation analysis redundant, and can be deleted without changing results of the analysis.

 

 

 

 

The 2.1 subscript trailing the second predictor variable signifies that this variable was residualized on the variable which subscript follows the period. The associated matrix of coefficients of determination

 

 

 

 

shows that the predictor variables are orthogonal and that the coefficient of multiple determination can be obtained by simple summation. For the example, .25 + .33 equals .58, a value obtained by previous analyses of this data set.