Multiple Regression Analysis

 

Multiple regression analysis is a method for explanation of phenomena and prediction of future events. In multiple regression analysis, a set of predictor variables is used to explain variability of the criterion variable.

Example: A researcher is interested in predicting the graduation rate from age and ability scores. Use the data he or she collected in 2001 to develop a multiple regression equation.

I. Identify the criterion variable and the predictor variables.

The criterion variable is _________.

The two predictors are ____ and _____.
 

II. Produce a scatterplot matrix.

SPSS for Windows

A. Enter two predictors and one criterion: age, score, and rate. 

 B. Choose Graphs \ Scatter. 

a. Select the Matrix picture button.

b. Click on Define. Matrix Variables. Select age, score, and rate. Hold down the Shift key and click the first variable Age and the last variable Rate to highlight all of them. (You may also hold down the Ctrl key to select each variable.) Click OK.  

C. Modify the chart to display the scatterplot matrix with regression lines.

SPSS version 11.5

a. Double-click on the scatterplot matrix to bring out the Chart Editor window.

b. From the Chart Editor menus choose: Chart \ Options.  

(a) Fit Line. Click Total.

(b) Click Fit Options.

(c) Fit Method. Linear regression is the default. Click Continue and OK. The scatterplot matrix with regression lines appears in the Chart Editor window. Close the Chart Editor window.

SPSS version 12.0

a. Double-click on the scatterplot to bring up the Chart Editor window.

b. Click on any one of the data points to highlight all of them.

c. From the Chart Editor Window menus choose: Chart / Add Chart Element/ Fit Line at Total.

Note that Fit Method: "Linear" is the default. Close the Chart Editor window and return to the viewer window

 

SPSS Output

A. How to read the scatterplot matrix

 

For example, the first diagonal cell contains the variable AGE. For all plots in the first row, AGE is plotted on the y axis (vertical). The first plot in the first row is the plot of AGE (y axis) against SCORE (x axis). The second plot in the first row is the plot of AGE (y axis) against RATE (x axis).

The second diagonal cell contains the variable SCORE. For all plots in the second row, SCORE is plotted on the y axis (vertical). The first plot in the second row is the plot of SCORE (y axis) against AGE (x axis). The second plot in the second row is the plot of SCORE (y axis) against RATE (x axis).

The third diagonal cell contains the variable RATE. For all plots in the third row, RATE is plotted on the y axis (vertical). The first plot in the third row is the plot of RATE (y axis) against AGE (x axis). The second plot in the third row is the plot of RATE (y axis) against SCORE (x axis).

B. Inspect all the plots. What do you conclude? 

a. All the relationships are linear.

b. The correlation between Age and Score is negative. Younger subjects have higher ability scores.

c. The correlation between Age and Rate is negative. Younger subjects have higher graduation rates.  

d. The correlation between Score and Rate is positive. Subjects with higher ability scores have higher graduation rates.  
 

III. Produce a three-dimensional plot of graduation rate, age, and ability scores.

SPSS for Windows

A. To obtain a 3-D plot, from the menus choose: Graphs \ Scatter. Select 3-D. Click on Define. 

B. Select Rate as the Y Axis variable. Select Age as the X Axis variable. Select Score as the Z Axis variable. 

C. Click OK. The 3-D plot appears. 

D. Modify the 3-D plot to display spikes and wireframe. 

         a. Double click on the 3-D plot to bring out the Chart Editor window. 

         b. From the Chart Editor menus choose:

(a) Case Labels. Click on the drop-down list. Choose On.

(b) Spikes. Click on the drop-down list. Choose Floor: Spikes are dropped to the plane of the x and z axes of a 3-D scatterplot. 

(c) Wireframe. Select the full frame picture. Click OK.

E. To rotate a 3-D scatterplot, from the Chart Editor window menu bar choose: Format \ 3-D Rotation

   

a. You can rotate in six directions. The direction of rotation is indicated on each button. You can click on a rotation button and release it. 

b. Click on Apply. Click Close. Close the Chart Editor window and return to the Viewer window.

 SPSS Chart

 

Inspect the 3-D scatterplot: The position of each point is based on its values for all three variables. Cases with higher ability scores and younger have higher graduation rates. 
 

IV. Multiple Regression Analysis

SPSS for Windows

A. From the menus choose: Analyze \ Regression \ Linear

a. Select Rate as the Dependent variable.

b. Highlight both predictors Age and Score and move them to the Independent Variable list. (Hold down the Ctrl key and click Age and Score to highlight them.)

c. The method used in developing the regression model is Enter: All the predictors are entered in a single step. It is the default method.

B. Click on Statistics. Choose Descriptives. Click Continue.

C. Click on Save to save the Unstandardized Predicted Values. Click Continue. Click OK.

  Note that to examine outliners and the assumptions made about residuals, you may click on Plots.

 

SPSS Printout

A. Examine the mean and standard deviation of each variable.

Examine the correlation matrix.

The correlation between Age and Score is -.712. The negative correlation between Age and Score indicates that younger subjects have higher scores on the ability test.

The correlation between Age and Rate is -.90 and it is statistically significant. Note that the negative correlation between Age and Rate indicates younger subjects have higher graduation rates.

The correlation between Score and Rate is .92 and it is statistically significant. Note that the positive correlation between Score and Rate indicates that subjects with higher ability scores have higher graduation rates.

B. Coefficient of Multiple Correlation and Squared Multiple Correlation

a. R = .985. Note that

b. R Square

The squared multiple correlation can be directly interpreted in terms of percentage of accountable variation. About 97% of the variance in the graduation rate can be accounted for by age and the ability scores. 

c. Adjust R Square

R square may be overestimated when the data sets have few cases (n) relative to number of predictors (k)

The adjusted R square can be computed as




n = sample size and k = number of predictors

Data sets with a small sample size and a large number of predictors will have a greater difference between the obtained and adjusted R square. 

c. The standard error of estimate indicates the accuracy of a prediction model. The smaller the standard error of estimate, the better the prediction. You may use the standard error of estimate to construct prediction intervals for the predicted values.

Computation Procedures

The standard error of estimate is the standard deviation of the error variable (Y^ = Y-Y'). The unbiased standard error of  estimate is computed as 

n = sample size and k = number of predictors

The value of standard error of estimate is .04777. 

 C. Overall Relationship

Test of  

Is the regression model with two predictors (Age and Score) significantly related to the criterion variable Y? 
 

Express the F ratio in terms of the proportions of variance accounted for and not accounted for.

 

where k is the number of predictor variables and N is the sample size.

F(2,6) = 100.45, p < .001.  (SPSS output: Sig. = .000. It can be reported as p < .001) What do you conclude?

It is concluded that age and the ability scores account for about 97% of the variance in the graduation rates and that this finding is statistically significant.

D. Individual Regression Coefficients

Multiple linear regression extends bivariate regression by incorporating multiple independent variables. Examine the following individual regression coefficients. 

         


Unstandardized Weights

         Predicted graduation rate = (-.0486) age + (.0706) score + (1.386)

Beta Weights (Standardized Regression coefficients)

Zpredicted graduation rate = (-.50) Zage + (.565) Zscore 


Significance Testing for Individual Regression Coefficients

Hypotheses

The null hypothesis states that a chosen regression coefficient is equal to 0 given that all the other predictors are included in the regression model. The alternative hypothesis states that a chosen regression coefficient is significantly different from 0

 

The t-ratio evaluates the significance of each regression coefficient. The regression coefficient is a measure of the linear relationship between a chosen predictor and the criterion variable when the influences of the other predictors are partialled out or held constant (What does it mean? How to do it? You will learn the topic later.

Bage =.0486  (or beta = -.50) measures the effect of the predictor variable Age on the criterion variable Graduation Rate, holding the other predictor Ability Score constant.

Bscore =.07056 (or beta = .565) measures the effect of the predictor variable Ability Score on the criterion variable Graduation Rate, holding the other predictor Age constant.

Examine the t ratio and the associated probability.

1. Is Age important for making good predictions?

Given that the predictor variable Score is in the regression equation, the regression coefficient associated with Age is significantly different from zero, t = -5.055, p = .002.

2. Is Ability Score important for making good predictions?

Given that the predictor variable Age is in the regression equation, the regression coefficient associated with Score is significantly different from zero, t = 5.704, p = .001.

What do you conclude?

The regression coefficients for Age and Score are statistically different from zero. Both predictors are important for better prediction.

 

E. Test of all the regression coefficients = Test of R Square

Is there is a linear relationship between the criterion variable and the entire set of predictor variables

State the null hypothesis associated with the above analysis of variance table.

The overall F test is a test of the null hypothesis that all the population values of the regression coefficients are equal to 0. 

 F(2,6) = 100.45, p < .001. What do you conclude?

There is a significant linear relationship between the criterion variable and the entire set of predictor variables. 

F. Report the results.

A multiple regression analysis was conducted to evaluate how well age and the ability score predicted the graduation rate. The predictor variables were age and the ability score, while the criterion variable was the graduation rate. There was a significant linear relationship between the criterion variable and the entire set of predictor variables,  F(2,6) = 100.45, p < .001. The sample multiple correlation coefficient was .985. About 97% of the variance of the graduation rate in the sample can be accounted for by age and the ability score. Both predictors were important for better prediction.

G. Switch to the Data Editor window. The predicted graduation rates are displayed there.

     Choose Window from the menus. Select the Data editor window from a list of open windows.

SPSS for Windows

The multiple correlation coefficient can be viewed as the correlation between the actual graduation rates and the predicted graduation rates.

Apply the Correlation (bivariate) Procedure to obtain the correlation between the obtained rate (Rate) and the predicted rate (pre_1).

SPSS Output

It agrees with the multiple R (.985).