Hierarchical Regression Analysis

 

In the preceding chapter we discussed advantages of the multiple regression analysis with not correlated, orthogonal, predictor variables. However, orthogonal predictor variables, with few exceptions, do not occur in analyses of real data. In this chapter we are going to discuss one of the procedures how to make the predictor variables orthogonal prior to the multiple regression analysis. As a prelude to this discussion, let us introduce a method for simultaneous computation of the coefficients of multiple determination for all variables of a set of data.

 

Communality

In some instances, the decision which variables should be included in the predictor set of variables and which variable designated as a criterion may be arbitrary. Also, in orthogonalization of predictor variables by the successive partialing, the order of this partialing, as will be demonstrated later, is important and, sometimes, may be arbitrary. In these cases it may be helpful to know the amount of variance each variable in the analysis shares with the other variables. This amount of shared variance, common to a variable and the other variables, is called communality. Communalityis the amount of variance a variable shares with other variables.

Computational Procedures

To compute communalities for all variables could add to a laborious process of successively designating each variable as the criterion variable, remaining variables as the predictor variables, and carrying on a series of k multiple regression analyses. The coefficients of multiple determination from these analyses would estimate communalities, the amount of variance each variable shares with the other variables.

A better procedure than carrying out a series of k multiple regression analyses is suggested by the equation

 

 

where I is an identity matrix and R is the matrix of correlations between variables included in the analysis. The diag subscript on the left side of the above equation symbolizes that C is diagonalized. The communalities, i.e., coefficients of multiple determination, are located along the principal diagonal of the matrix C.

Correlation Matrix

The central part of the above equation is the computation of inverse of a correlation matrix. For a simple case of two variables

 

 

the correlation matrix is

 

 

The variables correlate .30 and the coefficient of determination is .09.

 Inverse of a correlation matrix

The determinant of the correlation matrix is .91 and, in this special case of a correlation matrix, not only determines whether the matrix is invertible, but also equals the coefficient of alienation. For our example, .09 and .91 sum to one. The inverse of the matrix R equals

 

 

Note that correlation matrices have ones in the principal diagonal, so interchanging of these elements is immaterial and the inverse is obtained simply by changing signs of the off-diagonal elements and by dividing all elements by the determinant.

Diagonalize

For demonstration purposes, we will diagonalize the inverted matrix of inter-correlations prior to setting its elements to the -1 power. It does not matter whether we diagonalize the inverted matrix of inter-correlations or the resulting matrix of communalities.

Reciprocal

The main points of the algebra of powers are that a number to a zero power equals one, to a fractional power a respective root, and that the negative sign signifies a reciprocal. Interchanging numerators and denominators in the principal diagonal of the inverted matrix R,

 

 

we changed the elements in the principal diagonal to the coefficients of alienation.

Subtract

Since the coefficient of determination are one's complements of the coefficients of alienation, subtracting the above matrix from identity matrix results in the matrix of communalities

 

 

where the coefficients of multiple determination are located along the principal diagonal (1-.91=.09). These coefficients determine the amount of variance variable shares with all the other variables, i.e., their communalities.

Example

This works for any number of variables. For our previous example of a multiple regression analysis

 

 

the matrix of correlations is

 

 

its inverse equals

 

 

and the matrix of communalities is

 

 

The coefficient of multiple determination, for the case of the X2 and Y regressed on X1 , is .25.  The coefficient of multiple determination, for the case of the X1 and Y regressed on X2 , is .49. Notice that the previously computed coefficient of multiple determination, for the case of the X1 and X2 regressed on Y, is indeed .58.

 

Orthogonalization of Two Predictors by Successive Partialing

In the previous section we have discussed solution to an idea of successively interchanging variables between the predictor and criterion sets. A similar idea is that of the orthogonalization of predictor variables by the successive partialing.

Reasoning

This idea is based on the realization that the correlation between the predicted and error, residual component of the regression analysis is zero. Thus, why not precede the regression analysis proper by a series of preliminary regression analyses, decomposing the set of the predictor variables, one by one, to their predicted and residual components, discarding the predictable components, and keeping only the residual components? This is the reasoning behind the hierarchical multiple regression analysis.

Predict X2 from X1

Assign the predictor variable X1 as the primary predictor. For the current example, let's designate the second variable in the predictor set, X2, as the criterion variable and split the criterion into its predictable and residual components, as

 

 

That was accomplished by using equation

 

 

and equation

 

 

The means of both variables equal 3.00, their variances are identical, and they correlate .30, thus

 

 

Predictor Variable and Predicted Variable 

Observe, within the associated matrix of coefficients of determination,

 

 

that the first predictor (X1) and the predicted component () are perfectly related (r=1) and thus these components are, from the viewpoint of correlation analysis, identical, interchangeable, redundant, and one can be deleted without changing results of the analysis.

Predicted Variable and Residual Variable

The identity submatrix indicates that the predicted Component () and residual component (X2^) are not correlated (r=0).

Discard and Replace

Thus, to obtain orthogonal predictor variables, one can discard the predictable component () and replace the second predictor with the residual component (X2^) without changing the magnitude of the coefficient of multiple determination. 

  Predict Y from X1 and X2.1

The 2.1 subscript trailing the second predictor variable signifies that this variable was residualized on the variable which subscript follows the period. 

     

The associated matrix of coefficients of determination

 

    

 

demonstrates that, in the case of orthogonal predictor variables, the coefficient of multiple determination can be obtained by simple summation. For the example, .25 + .33 equals .58, a value obtained by previous analyses of this data set.

 

Orthogonalization of Three Predictors by Successive Partialing

For three or more predictors, the partialing of the first variable is identical to the procedure described above, however, the partialing of the second and subsequent variables necessitates the employment of the multiple regression analysis, as both the initial variable and the previously partialled variables have to be partialled out.

Residualize the third predictor X3

Let us continue with the orthogonalization of the predictor set of variables. During the successive partialing we have so far selected the first variable of the predictor set of variables as the initial variable and partialled out the second variable. Adding the third predictor variable, the predictor set of variables is

 

 

Designating the third variable of the predictor set as the criterion variable and using a computer program to obtain the regression weights B = [.20 .48] and the intercept, 2.40, the predicted and residual components can be computed as

 

 

and

 

  

The third predictor variable is residualized as

 

 

Discard the predicted component () and replace the predictor variable X3 by the residual component ().

Predict Y from X1, X2.1 and X3.12

The data set with orthogonalized predictor variables is

 

 

The multiple regression analysis of the above data is

 

 

and shows the additivity of the variance contributions of each of the orthogonalized predictor variables.

Coefficients of (Bivariate) Determination

The matrix of coefficients of determination for the predictor and criterion variable is shown below.

 

 

Coefficient of Multiple Determination

Since the predictors are orthogonal, the coefficient of multiple determination can be computed by summation of squared cross-correlations. For the example, the coefficient of multiple determination is computed as .25 + .33 + .25 and equals  .83.

Standard Variance Components

The above regression analysis also illustrates the decomposition of the standardized variance of the criterion variable, equal to 1, into variance components explained by each predictor variable and to the residual component. For the example, 1.00 = .83 + .17.

Interpretations

When the predictor X1 is entered first, it accounts for 25% of the variance in Y. The second predictor X2 accounts for additional 33% of the variance in Y. Then t
he third predictor X3 accounts for extra 25% of the variance in Y.