Regression Analysis

 

The primary function of the general linear model is to predict outcomes if variables related to these outcomes are known and can be quantified. To accomplish this goal, the methods of the general linear model of statistics partition variance into components that can be predicted and into components that are not accounted for by variance in the predictor variables. The total variance of the criterion variable gets partitioned into the determined and alienated components, defined by the coefficients of determination and alienation. An integral part of this process is also computation of the standard error of prediction. The simplest of these predictive models, the model of bivariate prediction, is outlined in the present chapter.

 

Compute the Coefficient of Correlation

Let us begin by considering a set of two variables, X and Y that have been translated to standard scores.

 

 

Next, multiply the standard scores by each other, add them and divide by the n to obtain the correlation coefficient. r = .50.

 

Compute the Predicted Scores

Third, multiply the predictor, Zx, by the correlation coefficient (.50) to get the predicted scores.

 

 

The resulting predicted variable, Zy', can be shown below.

 

Compute the Error Scores

Fourth, subtract the predicted scores from the criterion variable, Zy, to get the error scores.

 


The resulting error variable, Zy^, can be shown below.


 

Compute the Variance Components

Finally, compute the variances of both the predicted and error scores. 

 

In this table, directly observable is the partitioning of the variance of the criterion variable Zy (1.00) into its predicted (25) and error (.75) components. This partitioning is theoretically justified by the specification equation

 

 

Additive

The variance of the predicted component (.25) is equal to the square of the coefficient of correlation (.50). The coefficient of alienation (.75) indexes the variance due to sampling error. The information (coefficient of determination) and error (coefficient of alienation) components are additive and sum to the variance of the variable Zy, which equals one.

Percentage of Variance Accounted for

Multiplying the proportion by one hundred, we can say that 25 percent of the variance in Y is predictable from the knowledge of the variability in X, and that 75 percent of the variance in the criterion is not accounted for.

 

Prediction Using Deviation Scores

Initially, prediction within the framework of deviation scores requires a conversion of the basic equations for computation of predicted and error scores from the standard score form to the deviation score form. For the predicted scores this conversion formula is

 

 

The translation of the equation for the errors of prediction is straightforward:

 

 

Compute the Deviation Scores.

First, transfer the obtained scores (X and Y) to deviation scores (x and y).

 

 

Compute the Predicted Variable.

Next, the predicted variable can be computed as 

 

 

The resulting predicted variable, y', can be shown below.

 

  

Compute the Error Variable.

Third, the error variable can be computed as 

 

 

The resulting error variable, y^, can be shown below.

 

 

Compute the Variance Components

Last, compute the variances of both the predicted and error scores.

 

 

The variance of the criterion variable is partitioned into its predictable and error components as defined by the deviation score specification equation

 

 

Dividing both sides of the above equation by the variance of the y variable

 

 

results in

 

 

The above equation expresses the coefficients of determination and alienation in the form of their corresponding variance ratia.

Comparison of the above equation, with the fundamental specification equation expressed in terms of the coefficients of determination and alienation,

 

 

allows for an expression of the coefficients of determination and alienation in terms of their variance components as

 

 

and

 

 

Multiplying both sides of the above equations by the variance of the criterion variable Y

 

 

and

 

 

provides two fundamental equations for determining the variance of predicted and error scores from the variance of the criterion variable.

 

Standard Error of Estimate

Taking the square root of both sides of the last equation in the preceding section, the standard error of estimate can be defined as

 

 

The standard error of estimate is a measure of error of prediction. When the relationship between the predictor variable (X) and the criterion variable (Y) is perfect (r = 1 or r = -1), the standard error of estimate will be equal to zero. When the coefficient of correlation is 0, the standard error of estimate will be equal to the standard deviation of the criterion variable Y.  

 

Prediction Using Obtained Scores

Prediction using obtained scores requires the translation of the computational equations for the predicted and error scores from deviation scores to obtained scores. For predicted scores the translated formula is

 

 

The translated equation for the errors of prediction is

 

 

The partitioning of variance into predicted and error components, working directly with the obtained scores, are illustrated in the table below.

 

 

Notice that the variances of obtained and deviation scores are identical. Therefore the specification equation, the variance renderings for the coefficients of determination and alienation, and the standard errors of prediction must be the same within the frameworks of both obtained and deviation scores.

Correlational Structure of Regression Analysis

Of considerable interest are relationships between predictor, criterion, predicted, and error variables. These correlations determine the framework of bivariate regression and are presented, using squared coefficient of correlation, in the following table. 

  

X and Y

The correlation between X and Y indexes the initial relationship between the predictor and criterion variables. The squared coefficient of correlation between X and Y is .25. 



X and Y'

Since the predicted scores are the linear transformations of the predictor variable, the correlation between the predictor and predicted variable (X and Y') must be perfect (1).

Y' and Y^

One of the fundamental properties of the general linear model is that predicted and error variables are not correlated (0).

X and Y^

Since the predicted variables are linear transformations of the predictor variables, the correlation between predictor and error variables must be also zero.

Coefficients of Determination and Alienation

Finally, consider the correlations between criterion variable and both the predicted and error variables. Squared, they define the basic specification equation of bivariate regression, partitioning variance of criterion variable into coefficient of determination and coefficient of alienation.

 

Looking into Future

The usefulness of relationships described in this chapter may be illustrated by the following example. Suppose you would like to predict your life span. In the absence of any additional information, your best bet would be the 79 years if you are a female and 72 years if you are a male and if you live in the United States. In Japan, the life expectancy of females is 82 years, males' 77 years. In Chad, the female life expectancy is 41 years, males' 39 years. Based on the review of literature, the correlation between the parent life span and the offspring's is about .50.

Predicted Life-Span

Suppose you are an American male with East European ancestry. Your father died at the age of 76 years in the Old Country where the male life expectancy is 68 years. The standard deviation of these life expectancies is about 4.5 years. Using the equation

 

 

you may estimate your own life-span as .5(4.5/4.5)(76-68)+72 which is 76. The standard error associated with this prediction is  

 

 

computed for our example as 4.5 (1 - .25)1/2 which equals 3.90, a value smaller than the standard error of 4.5 years for the pure guess (rxy = 0).

95% Prediction Interval

It is unlikely that your real life-span exactly equals the predicted life-span. The 95% prediction interval for a predicted value of 76 ranges from 68 years to 84 years. 

 

 

The center of the prediction interval is the predicted value of 76. The standard error of estimate is 3.90. The probability is .95 for the plus-minus 1.96 standard error of estimate prediction band. 

Compute the lower limit and the upper limit as


   

Upper limit:  76 + (1.96) (3.90) = 83.6
Lower Limit: 76 + (-1.96) (3.90)
= 68.4   

If you repeat the study and compute the 95% prediction interval each time, 95% of these intervals would include your true lifespan  

 

Summary

The principal equations for statistical prediction are

 

 

with their slopes and intercepts

 

 

and means and variances

 

The lowercase sigma has two forms, and . Note that (also written as ) signifies standard variance

 

The principal equations for errors of statistical prediction are

 

 

with their means and standard errors of predictions

 

 

The above formulae include equations for computation of slopes and intercepts of the regression lines, together with the means and variances for both the predicted and error scores. Included are also formulae for computation of standard errors of prediction.