Linear Regression Analysis

 

Are scores on math skills correlated with the performance in a statistics course? A pretest on math skills and the final exam in an introductory statistics class were collected in the fall, 2001 as shown below.

 

SPSS for Windows

A. Define the variables `pretest` and `final`.
 

B. Enter values.

 

 

Visualize Data: Create a Scatterplot

From the menus choose: Graphs \ Scatter.  

Select the Simple picture button.

Click on Define.

Variables for Y Axis and X Axis

Select the variable `final` that will determine the vertical position of each point.

Select the variable `pretest` that will determine the horizontal position of each point. Click OK.

  

SPSS Printout 

Examine the scatterplot

 

The relationship between two variables is not perfect. However, the correlation appears to be linear and the direction of the relationship is positive. Students with higher scores on the math skills had higher scores on the final exam.  Students with lower scores on the math skills had lower scores on the final exam.  


Correlation

Are the scores on the pretest and final test correlated? 

From the menus choose: Analyze \ Correlate \ Bivariate.  

Select the variables pretest and final to be correlated.

By default, Pearson correlation will be computed and a two-sided test of significance is used. The null hypothesis states that the correlation is equal to zero. The research hypothesis states that the correlation is different from zero.  

Click OK to obtain the result.

The null hypothesis is that the population coefficient of correlation between the pretest and the final exam is zero. The alternative hypothesis is that  the population coefficient of correlation between the pretest and the final exam is significantly different from zero. 

Examine the output. For a p-value of .000, report It as p < .001.

It is concluded that math skills and grade in statistics are correlated (r = .893, p < .001).

 

Linear Regression Analysis


Stage 1: Development 

Since the math skills was strongly correlated with grades in statistics, the researcher decided to use the math skills as a predictor variable to predict grades in statistics.

Task: Develop a linear regression equation to predict the scores on the final exam from the scores on the pretest.

From the menus choose: Analyze \ Regression \ Linear. 

         Dependent and Independent (s)

Move the variable `final` to the Dependent variable list.

Move the variable `pretest` to the Independent variable list.

To save predicted values, residuals, and prediction interval for individual predicted values, click on Save. The Save dialog box will appear. Click Continue and OK.



 


SPSS Printout

A. Summary Statistics for the Equation

a. What is the correlation between the pretest and the final exam?  (r = .893)

b. The Pearson r, when squared, offers the proportion of variance in one variable predictable from the other. 

What percentage of the variation in the final exam is explained by the pretest? (80%)

c.
The standard error of estimate is the standard deviation of the error variable. The formula used to compute the estimated standard error of estimate is 

 

 

 

Y: observed values, Y': predicted values, and Y-Y' = error   

Alternatively,  the estimated standard error of estimate can be defined as    
 

 

1. The smaller the standard error of estimate, the better the prediction model. 

2. The standard error of estimate can be used to construct prediction intervals around a predicted value. 


 

B. Regression and Prediction

Find the linear regression equation for predicting the final exam from the pretest.

 

 a. The linear regression equation in obtained score is ________.  

Predicted final exam = .868pretest +.996 

 b. Stage 2: Estimation

A new student who received a score of 2 on the pretest in the fall, 2002. What is the best estimate of the score that the student will receive on the final exam?

.868(2) + .996 = ____ (2.732)

Unless there is a perfect relationship, it is unlikely that the student's real score exactly equals the predicted value. 

Switch to the Data Editor window. Notice that the predicted values, residuals, and 95% prediction intervals are displayed in the Data Editor window.  

For example, the 95% prediction interval for a predicted score of 2.73 is from .41 to 5.06.

Note that you should input a new data set (e.g., scores on the pretest in the fall, 2002) and compute the predicted scores on the final exam by using the regression equation obtained previously.

Predicted final exam = .868pretest +.996

Example of Building and Using a Bivariate Regression Model by Calvin Garbin 

c. The linear regression equation in standard score form is _______.

Zpredicted final exam = .893Zpretest

Note that there was only one predictor variable. The standardized regression coefficient was equal to the correlation coefficient (Beta = r = .893). 

Test the coefficient of correlation (r). The null hypothesis is that the population correlation is equal to zero. The two variables are not correlated.
 

The t ratio can be computed as

  The denominator is the standard error of r. t = 7, p < .001.

d. Examine the Unstandardized Regression Coefficient

A positive regression coefficient on the predictor `pretest` indicates that a higher score on the pretest will increase the value of the criterion variable (the final examine). 

Test of the Regression Coefficient B (the Slope)

The null hypothesis is that a given regression coefficient is equal to 0.

 t = B / standard error of B = 0.868 / .122 = 7 

Report the results.

The regression coefficients associated with the predictor variable " pretest" is significantly different from zero, t
(13) = 7, p < .001. The predictor, pretest, is important for better prediction.
 

Plot the regression Line

From the menus choose: Graphs \ Scatter.  

Select the Simple picture button.

Click on Define.

Select the variable `final` that will determine the vertical position (Y) of each point.

Select the variable `pretest` that will determine the horizontal (X) position of each point. Click OK.

The scatterplot will be displayed in the Output window.

 Modify the chart. Display a scatterplot with a regression line.

Instructions for SPSS 11.50,

a. Double-click on the scatterplot to bring out the Chart Editor window.

b. From the chart editor’s menus choose: Chart \ Options.

Click on Total. Click on Fit Options.

Fit Method. Select Linear Regression.

Click on Continue. Click OK. Close the chart editor window and return to the output window.

Instructions for SPSS 12.0 ,

         (a) Double-click on the scatterplot to bring up the Chart Editor window.

         (b) Click on any one of the data points to highlight all of them.

        (c) From the Chart Editor Window menus choose: Chart / Add Chart Element/ Fit Line at Total.

        Close the Chart Editor window and return to the viewer window

It is clearly a positive linear relationship.