Multiple Regression Analysis

Multiple regression analysis is a method for explanation of phenomena and prediction of future events. A coefficient of correlation between variables X and Y is a quantitative index of association between these two variables. In its squared form, as a coefficient of determination, indicates the amount of variance (information) in the criterion variable Y that is accounted for by the variation in the predictor variable X. A multivariate counterpart of the coefficient of determination is the coefficient of multiple determination,. In multiple regression analysis, the set of predictor variables  is used to explain variability of the criterion variable .

Matrix of Correlations R

Initially, a matrix of correlations R is computed for all variables involved in the analysis. This matrix can be conceptualized as a supermatrix, containing submatrices , and a scalar number 1

annotated as

An intuitive approach to the multiple regression analysis is to sum the squared correlations between the predictor variables and the criterion variable to obtain an index of the over-all relationship between the predictor variables and the criterion variable. However, such a sum is often greater than one, suggesting that simple summation of the squared coefficients of correlations is not a correct procedure to employ.

In fact, a simple summation of squared coefficients of correlations between the predictor variables and the criterion variable is the correct procedure, but only in the special case when the predictor variables are not correlated. If the predictors are related, their inter-correlations must be removed that only the unique contributions of each predictor toward explanation of the criterion are included.

 Fundamental Equation of Multiple Regression Analysis

The fundamental equation of the multiple regression analysis is

The expression on the left side signifies the coefficient of multiple determination (or squared multiple correlation coefficient). The expressions on the right side are the transposed matrix of cross-correlations (Rxy'=Ryx), the matrix of inter-correlations to be inverted (Rxx-1), and the matrix of cross-correlations (Rxy).

The premultiplication of the matrix of cross-correlations by its transpose changes the coefficients of correlation into coefficients of determination. The function of the inverted matrix of the inter-correlations is to remove the redundant variance from the matrix of inter-correlations of the predictor set of variables.

Operations

The fundamental equation of regression analysis contains two distinct operations. The first operation is the postmultiplication of the transpose of cross-correlations by the inverse of inter-correlations, resulting in the matrix of beta weights B

The second operation is the premultiplication of the cross-correlations, by the beta weights, resulting in the coefficient of multiple determination

Notice that in the case of the multiple regression, submatrices of cross-correlations and beta weights are in reality vectors. As you will realize later, the convention of signifying both the matrices and the vectors by capital letters facilitates the discussion of canonical analysis of which the multiple regression is a special case.

An Example

At this point, let us consider a hypothetical example of the multiple regression analysis, consisting of two predictor variables X and a criterion variable Y. As an example you may consider scores on an aptitude test and scores on test of motivation as predictors of academic performance. In the Visual Statistics Studio select (  Designs, Regression Analysis ) and under the Multiple Regression heading click on the Multiple Regression Analysis with Two Predictor Variables command.

Select ( Analysis I, Correlations, Matrix of Correlations ), Select All, and under the Superimpose Correlation Matrix on the Vector Display heading click on the Correlations command.

Select ( Transfers, Launch Matrix Module, Vectors to Matrix ) and store the correlation supermatrix in the Matrix Cell 1. This supermatrix contains submatrices , , , and scalar number 1.00

Click on the Partition command and define the supermatrix as


Click on the Accept command and store the Q1, Q2, Q3, and Q4 submatrices beginning Matrix Cell 2.

 

Beta Weights and Coefficient of Multiple Determination

Inverse the submatrix of inter-correlations

The first question is whether the submatrix of inter-correlations is invertible. Its determinant, computed as (1)(1)-(.30)(.30), equals .91; the submatrix is not singular and can be inverted by changing signs of the off-diagonal elements and dividing by the determinant

as

In the Matrix module, click on the Inverse command, select the Q1 matrix, and store the result in the Matrix Cell 6.

Beta Weights

The standard regression coefficients, beta weights, are computed as

 

  with resulting vector of beta weights

In the Matrix module click on the X*Y command and multiply the matrices in the Matrix Cells 5 and 6. Store the result in the Matrix Cell 7.

Coefficient of Multiple Determination

The coefficient of the multiple determination is computed as

This operation weights the cross-correlations by the beta weights

The coefficient of multiple determination equals .16 + .42 which equals .58.

In the Matrix module click on the X*Y command and multiply the matrices in the Matrix Cells 7 and 3. Store the result in the Matrix Cell 8.

The coefficient of multiple correlation R is obtained by taking a square root of the coefficient of multiple determination and equals .76.

At this point we can re-label the upper part of the matrix display

as

For our hypothetical experiment we can conclude that if the reliability of our measurements would be perfect, 58 percent of variability in the academic achievement could be explained by the students' scores on the test of aptitude and the test of motivation.

Similar results can be obtained from a computer program for multiple regression with output summarized as 

 

Multiple Regression Analysis

Correlated Predictors

Select ( Designs, Regression Analysis ) and click on the Multiple Regression Analysis with Two Predictor Variables. Select ( Analysis I, Multiple Regression Analysis ) and mark the predictor and the criterion variables. The multiple regression analysis for our example will be displayed as

The bottom row was obtained by clicking on the left side of the blank line separating the descriptive statistics, selecting the Standardize Variance command, and marking the criterion variable Y.

Observe that the means of the predicted and error scores sum to the mean of the criterion variable

 

  Since the mean of the error component is zero, the mean of the predicted scores Y' equals the mean of the criterion variable Y.

The variances of the predicted and the error scores sum to the variance of the criterion variable, thus defining the specification equation for partitioning of variance by the multiple regression analysis as

Dividing both sides of the above equation by the variance of the criterion variable can standardize the above equation as

The coefficient of multiple determination equals

and the coefficient of multiple alienation equals

Note that the variances of the components of the predictor scores (.20 + .73 = .93) do not sum to the variance of the predicted variable (1.16), as the predictor variables are correlated and their weighted composites contain the covariance terms.

Select ( Data, Delete ), mark the B1 and B2 variables

and click on the Clear and Compact command. Select ( Analysis I, Correlation, Matrix of Correlations, Select All ) and click the Determination command under the Superimpose Correlation Matrix on the Vector Display heading. The coefficients of interest are shown below.

Note that the predictors X1 and X2 are related, predicted Y' and error Y^ components are orthogonal, and that standard variances of Y' (.582) and Y^ (.418) sum to the standard variance of Y (1.00).

Orthogonal Predictor Variables

Select ( Designs, Regression Analysis, Orthogonal Regression Designs ) and under the Multiple Regression heading click on the Multiple Regression Analysis with Three Orthogonal Predictor Variables command. Select ( Analysis I, Multiple Regression Analysis ), mark Predictors and the Criterion variable, and click on the Accept command.

 Select ( Analysis I, Correlation, Matrix of Correlations, Select All ) and click the Determination command under the Superimpose Correlation Matrix on the Vector Display heading. The coefficients of interest are shown below.

Note that the predictors O1, O2 and O3 are unrelated, predicted Y' and error Y^ components are orthogonal, that standard variances of Y' (.835) and Y^ (.165) sum to the standard variance of Y (1.00). Also note that the variance contribution of the B1, B2, and B3 (.812, .005, and .018) sum to the coefficient of multiple determination .835, as do the cross-correlations of O1, O2, and O3 with the criterion variable Y.

Multiple Regression Analysis of Incarceration Rates

The number of people a society puts into prisons varies greatly and provides an insight into a character of a society. Incarceration rates for the early 1990s for five societies are shown below

 
The incarceration rates are shown per 100,000 population; the countries were selected from a larger study. The major predictors of the crime rates are the variables capturing the numbers of broken families and the degree to which assets of a society are distributed unequally. In this particular study, the first predictor variable is the number of divorces per 10,000 population. The second predictor variable is the ratio, contrasting the percentage of the GNP going to the richest and poorest segments of the population. The criterion variable is the incarceration rate per 100,000 population.

Select ( Projects, Open Project Files, Correlation Analysis ). Click on the Incarceration [n=5] file name. Select (  Analysis I, Multiple Regression Analysis ).

Examine the standard variance components. About 88 percent of variance in international comparisons of incarceration rates can be accounted for by the disintegration of families and by the extremely unequal distribution of wealth.

[ Note that the original study from which a small sample of countries was selected for instructional purposes provides a more realistic estimate of the variance accounted for. A conservative estimate is that more than 50 percent of variance in international comparisons of incarceration rates can be accounted for by the disintegration of families and by the extremely unequal distribution of wealth. ]