Multiple Regression Analysis

Multiple regression analysis is a method for explanation of phenomena and prediction of future events. A coefficient of correlation between variables X and Y is a quantitative index of association between these two variables. In its squared form, as a coefficient of determination, indicates the amount of variance (information) in the criterion variable Y which is accounted for by the variation in the predictor variable X. A multivariate counterpart of the coefficient of determination is the coefficient of multiple determination, . In multiple regression analysis, the set of predictor variables  is used to explain variability of the criterion variable. Initially, a matrix of correlations R is computed for all variables involved in the analysis. This matrix can be conceptualized as a supermatrix, containing submatrices, and a scalar number

                                                                        

                                                             


An intuitive approach to the multiple regression analysis is to sum the correlations between the predictor variables and the criterion variable to obtain an index of the over-all relationship between the predictor variables and the criterion variable. However, such a sum is often greater than one, suggesting that simple summation of the coefficients of correlations is not a correct procedure to employ. As a matter of fact, a simple summation of squared coefficients of correlations between the predictor variables and the criterion variable is the correct procedure, but only in the special case when the predictor variables are orthogonal. If the predictors are related, their inter-correlations must be removed that only the unique contributions of each predictor toward explanation of the criterion are included. Thus, the fundamental equation of the multiple regression analysis is

 

                                                              

 

The expression on the left side signifies the coefficient of multiple determination. The expressions on the right side are the transposed matrix of cross-correlations, the matrix of inter-correlations to be inverted, and the matrix of cross-correlations. The pre-multiplication of the matrix of cross-correlations by its transpose changes the coefficients of correlation into coefficients of determination. The function of the inverted matrix of the inter-correlations is to remove the redundant variance from the matrix of inter-correlations of the predictor set of variables.

          The fundamental equation of regression analysis contains two distinct operations. The first operation is the post-multiplication of the transpose of cross-correlations by the inverse of inter-correlations, resulting in the matrix of beta weights B

 

                                                                 

 

The second operation is the pre-multiplication of the cross-correlations, by the beta weights, resulting in the coefficient of multiple determination

 

                                                                 

 

Notice that in the case of the multiple regressions, submatrices of cross-correlations and beta weights are in reality vectors. As you will realize later, the convention of signifying both the matrices and the vectors by capital letters facilitates the discussion of canonical analysis of which the multiple regression is a special case.

          At this point, let us consider a hypothetical example of the multiple regression analysis, consisting of two predictor variables X and a criterion variable Y. As an example you may consider scores on an aptitude test and scores on test of motivation as predictors of academic performance.

 

                                                            

 

The corresponding supermatrix of inter- and cross-correlations was computed as

 

                                                         

 

The first question is whether the matrix of inter-correlations is invertible. Its determinant, computed as (1)(1)-(.30)(.30), equals .91; the matrix is not singular and can be inverted by changing signs of the off-diagonal elements and dividing by the determinant

                                                  

                                                                       

as

                                                          

 

The standard partial regression coefficients (beta weights), are computed as

 

                             

 

with resulting vector of beta weights

 

                                                              

 

The coefficient of the multiple determination is computed as

 

                                                         

 

This operation weights the cross-correlations by the beta weights as

 

                                                       

 

The coefficient of multiple determination equals .16 + .42 which equals .58. The coefficient of multiple correlation R is obtained by taking a square root of the coefficient of multiple determination and equals .76. For our hypothetical experiment we can conclude that 58 percent of variability in the academic achievement is explained by the students' scores on the test of aptitude and the test of motivation.