The primary function of the general linear model is to predict outcomes if variables related to these outcomes are known and can be quantified. To accomplish this goal, the methods of the general linear model of statistics partition variance into components that can be predicted and into components that are not accounted for by variance in the predictor variables. The total variance of the criterion variable gets partitioned into the determined and alienated components, defined by the coefficients of determination and alienation. An integral part of this process is also computation of the standard error of prediction. The simplest of these predictive models, the model of bivariate prediction, is outlined in the present chapter.
Let us begin by considering a set of two variables, X and Y that have been translated to standard scores. To compute the predicted and unpredicted components we use the key equations for bivariate prediction
|
|
|
and
|
|
|
The main computational sequences of these operations are as follows: Compute the means of both variables and subtract these means from the obtained scores (make sure the sums of these deviation scores are equal to zero). Square and sum the deviation scores and divide these sums by the n to obtain the variances. Take the square root of the variances to obtain the standard deviations. Divide the deviation scores by their standard deviations to obtain the standard scores (make sure that their sums are equal to zero). Multiply the standard scores. Add them and divide by the n to obtain the correlation coefficient. Multiply the standard scores of variable X by the correlation coefficient to get the predicted scores. Subtract the predicted scores from the standard scores of variable Y to get the error scores (be sure that both the predicted and error scores sum to zero). Finally, compute the variances of both the predicted and error scores by squaring, summing, and dividing by n. For a set of illustrative data where X = [ 2 1 5 4 3 ] and Y = [1 2 3 4 5 ], the described operations are summarized in the table below.
|
|
|
In
this table, you can trace the computation of predicted and error scores by
computing the (.50). Observe the partitioning of the
variance of the criterion variable
into its predicted and error components. This
partitioning is theoretically justified by the specification equation
|
|
|
The variance of the predicted component (.25) is equal to the square of the coefficient of correlation (.50). The coefficient of alienation (.75) indexes the variance due to error. The information (coefficient of determination) and error (coefficient of alienation) components are additive and sum to the variance of the variable Y in standard scores, which equals one. Multiplying the proportion by one hundred, we can say that 25 percent of the variance in Y is predictable from the knowledge of the variability in X, and that 75 percent of the variance in the criterion is not accounted for.
Using the same set of illustrative data where X = [ 2 1 5 4 3 ] and Y = [1 2 3 4 5 ], translate these variables into deviation scores and compute the predicted and error scores as
|
|
|
and
|
|
|
Thus
|
|
|
The variance of the criterion variable is partitioned into its predictable and error components as defined by the deviation score specification equation
|
|
|
Dividing both sides of the above equation by the variance of the y variable
|
|
|
And comparing the above equation, with the fundamental specification equation expressed in terms of the coefficients of determination and alienation,
|
|
|
allows for an expression of the coefficients of determination and alienation in terms of their variance components as
|
|
|
and
|
|
|
Multiplying
both sides of the above equations by the variance of the criterion scores
|
|
|
and
|
|
|
results in two fundamental equations for determining the variance of predicted and error scores from the variance of the criterion variable. Taking the square root of both sides of the above equation, the standard error of prediction can be defined as
|
|
|
This formula is also called the standard error of estimate.
Predicted scores for regression using the obtained scores are calculated as
|
|
|
and the errors of prediction as
|
|
|
The partitioning of variance into predicted and error components, working directly with the obtained scores, are illustrated in the table below.
|
|
|
Notice that the variances of obtained and deviation scores are identical. Therefore the specification equation, the variance renderings for the coefficients of determination and alienation, and the standard errors of prediction must be the same within the frameworks of both obtained and deviation scores.
Of considerable interest are relationships between predictor, criterion, predicted, and error variables. These correlations determine the framework of bivariate regression and are presented, using coefficients of determination, in the following table.
|
|
|
The correlation between X and Y indexes the initial relationship between the predictor and criterion variables. Since the predicted scores are the linear transformations of the predictor variable, the correlation between the predictor and criterion variable must be perfect. Next, consider that one of the fundamental properties of the general linear model is that predicted and error variables are not correlated (orthogonal). Since the predicted variables are linear transformations of the predictor variables, the correlation between predictor and error variables must be also zero. Finally, consider the correlations between criterion variable and both the predicted and error variables. Squared, they define the basic specification equation of bivariate regression, partitioning variance of criterion variable into coefficient of determination and coefficient of alienation.
The
usefulness of relationships described in this chapter may be illustrated by the
following example. Suppose you would like to predict your life span. In the
absence of any additional information, your best bet would be the 79 years if
you are a female and 72 years if you are a male and if you live in the
you may estimate your own life-span as .5(4.5/4.5)(76-68)+72 which is 76. The standard error associated with this prediction is
computed for our example as 4.5 (1 - .50)1/2 which equals 3.18, a value smaller than the standard error of 4.5 years for the pure guess. Your predicted life span than can be plotted as

The probabilities associated with the standard deviations of the predicted life span are .68 for the plus-minus one standard deviation and about .95 for the plus-minus two standard deviation confidence band.
The principal equations for statistical prediction are
with their slopes and intercepts
and means and variances
The principal equations for errors of statistical prediction are
with their means and standard errors of predictions
The above formulae include equations for computation of slopes and intercepts of the regression lines, together with the means and variances for both the predicted and error scores. Included are also formulae for computation of standard errors of prediction.