General linear model of data analysis is well integrated around its central methods of correlation, regression, and canonical analysis. The core algorithms of these methods can be translated to a multitude of alternative computational procedures by algebraic manipulations of the essential relationships. When predictor variable is used to code group membership, regression analysis can be done entirely by using only the group means and variances. In this case, the coefficient of correlation, relating the X and Y variables, defining the predicted scores and the strength of the relationship, seems to disappear. However, the absence of the coefficient of correlation in formulae defining the regression analysis in terms of group means and variances is only an algebraic illusion, as shown in this chapter.
Consider an example relating a binary variable X [0 0 1 1 1] with a continuous variable Y [1 2 3 4 5], outlined as
First, let us describe how to compute mean of the variable Y from the means of the Y0 and Y1 variables. To obtain the arithmetic mean of a composite of two variables of unequal length, we have to weigh the means of the component variables. Thus
![]()
for the example, the mean of the variable Y was obtained from the means of the variable Y0 (1.5) and Y1 (4) as [2(1.5) + 3(4)]/5 = 3.0. From previous discussion we know that the means of the criterion variable and its predictable component are equal. Thus,
For the example, the mean of the predicted variable is 3.0, equal to the mean of the criterion scores Y.
Mean of Predicted Scores in the PQ Notation
Substituting q for n0/n and p for n1/n in the above formula the mean of the predicted scores can be written as
To obtain the variance of the variable
from
the means of the Y0 and Y1 variables, consider
that the since the guaranteed assumption of linearity in the case of the point biserial,
the point biserial can be also defined as
thus
For the example, the squared difference between means of variables Y0 (1.5) and Y1 (4), weighted by the variance of the binary variable X (6/25) equals 1.50.
Let us reconsider an example of regression analysis used in the previous section
concentrating on the error variable
Mean of error scores is always zero. Variance of the error scores can be obtained from the variances of the Y0 and Y1 variables as follows. Note that the error variable consists of deviation scores y0 and y1. The variances of variables Y0 and Y1 are
and
The deviation scores of the variables Y0 and Y1 can be isolated from the above formulae as
![]()
and
![]()
The variance of the error variable can be computed by summing either the right-hand sides or the left-hand sides of the above expressions. Using the right hand sides,
|
|
|
Using the left hand sides,
For this example, [2(.25) + 3(.67)] / 5.00 = (.50 + 2.00) / 5.00, the variance of the error variable equals .50.
The above expression can also be written as
|
|
|
Substituting q for n0/n and p for n1/n in the above formula and rearranging the term in the alphabetic order, the variance of the error variable can be written as
![]()
For the example, (3/5).67 + (2/5).25 = .40 + .10 = .50.
The key relationships discussed this chapter can be summarized for the means as
and for the variances, as
You may compare the above formulae with their renderings elsewhere and you will notice that the pq notation significantly simplifies their algebraic representations.