Analysis of covariance is a statistical technique that attempts to reduce the residual component of the analysis of variance by adding additional variables to the set of coding vectors. Analysis of covariance, within Fisher's sum of squares conceptual model, is a complex method. In his Methods of Multivariate Analysis Hope comments on the analysis of covariance as follows.
The analysis of covariance is the most complicated of the standard statistical methods. It is complicated because it involves simultaneous employment of the concepts of analysis of variance and regression analysis. Its conceptual complexity is compounded by the arithmetic jungle, which sprouts on the pages of textbooks when data on several variables are analyzed without use of matrices.
Aside of using matrix algebra, streamlining of the analysis of covariance can be accomplished by avoiding Fisher's sums of squares model altogether and, instead, by using the coded regression analysis throughout.
Consider a typical research problem, suitable for the analysis of covariance. A researcher is interested in the effectiveness of three methods of presentation of material. Before the experiment, a pretest is given, followed by measurement of the amount of retained material, For a fictional example where X is the pre-test, the covariate, and Y is the post-test, the data matrix may look as shown in the following table.
|
|
GROUP I |
|
GROUP II |
|
GROUP III |
|||
|
|
X |
Y |
|
X |
Y |
|
X |
Y |
|
S1 |
1 |
1 |
S4 |
4 |
4 |
S7 |
7 |
7 |
|
S2 |
0 |
2 |
S5 |
3 |
5 |
S8 |
6 |
8 |
|
S3 |
2 |
3 |
S6 |
5 |
6 |
S9 |
8 |
9 |
|
M |
1 |
2 |
|
4 |
5 |
|
7 |
8 |
|
s2 |
.67 |
.67 |
|
.67 |
.67 |
|
.67 |
.67 |
The question to answer is how much variance in the post-test scores was accounted for by each of the three methods of the material presentation and whether there is a significant difference between the means of groups I, II, and III.
Using Helmert's procedure, the experiment can be coded as
|
X1 |
X2 |
X |
Y |
|
1 |
-1 |
1 |
1 |
|
1 |
-1 |
0 |
2 |
|
1 |
-1 |
2 |
3 |
|
-1 |
-1 |
4 |
4 |
|
-1 |
-1 |
3 |
5 |
|
-1 |
-1 |
5 |
6 |
|
0 |
2 |
7 |
7 |
|
0 |
2 |
6 |
8 |
|
0 |
2 |
8 |
9 |
and the variables involved in the analysis correlated as
As the matrix of inter-correlations among predictors in not an identity matrix, the variance components due to each predictor variable are not additive. For the example, the coefficient of multiple determination equals .93 and the variance components for the predictor set of variables X1 X2 and X are [.06 .17 and .25]. Using successive partialling and including the covariate last changes the predictors as
|
X1 |
X2 |
X |
Y |
|
1 |
-1 |
0 |
1 |
|
1 |
-1 |
-1 |
2 |
|
1 |
-1 |
1 |
3 |
|
-1 |
-1 |
0 |
4 |
|
-1 |
-1 |
-1 |
5 |
|
-1 |
-1 |
1 |
6 |
|
0 |
2 |
0 |
7 |
|
0 |
2 |
-1 |
8 |
|
0 |
2 |
1 |
9 |
changes the matrix of inter- and cross-correlations as
The coefficient of multiple determination remains equal to .93. This time, the variance components for the predictor set of variables X1 X2 and X are additive. They equal to [.22 .68 and .02]. Including the covariate last makes it to account for 2 percent of the variance.
What will happen if we include the covariate first? For the example,
|
X |
X1 |
X2 |
Y |
|
1 |
1 |
-1 |
1 |
|
0 |
1 |
-1 |
2 |
|
2 |
1 |
-1 |
3 |
|
4 |
-1 |
-1 |
4 |
|
3 |
-1 |
-1 |
5 |
|
5 |
-1 |
-1 |
6 |
|
7 |
0 |
2 |
7 |
|
6 |
0 |
2 |
8 |
|
8 |
0 |
2 |
9 |
The matrix of inter- and cross-correlations is
The coefficient of multiple determination remains .93. The variances of the predictor variables are additive. The variance component for the covariate is .90, for the first coding vector .00 and for the second coding vector .03.
The order of inclusion of predictors into the regression equation has a strong influence on the amount of variance accounted for. Even though the amount of the total variance extracted remains unchanged, variables included earlier account for more variance than they would account if they were included at a later point in analysis.
The recommended strategy within this context is as follows. Rephrase the experimental question as to whether there is a significant difference among the group means after the effect of the covariate is removed and how much unique variance in the post-test scores is accounted for by the experimental conditions of the experiment. Next, verify the assumptions. The first assumption of the analysis of covariance is the existence of a linear relationship between the covariate and the dependent variable. The second assumption is that there is no interaction between the covariate and the treatments. The third assumption is that the covariate is significantly correlated with the dependent variable.
Let us consider a hypothetical study where all the assumptions of the analysis of covariance are met. In this study, the dependent variable Y is a measure of achievement after completing the course and the covariate X is a measure of aptitude for mathematics obtained prior the class commenced. Subjects were randomly assigned to three classrooms instructed by using different teaching methods. Data obtained from this study are shown in the table below.
|
|
Method One |
|
Method Two |
|
Method Three |
|||
|
|
X |
Y |
|
X |
Y |
|
X |
Y |
|
S1 |
8 |
6 |
S4 |
7 |
8 |
S7 |
9 |
12 |
|
S2 |
4 |
3 |
S5 |
5 |
7 |
S8 |
10 |
12 |
|
S3 |
6 |
4 |
S6 |
8 |
10 |
S9 |
7 |
10 |
|
M |
6.00 |
4.33 |
|
6.67 |
8.33 |
|
8.67 |
11.33 |
|
s2 |
2.67 |
1.56 |
|
1.56 |
1.56 |
|
1.56 |
.89 |
Inspecting the means we may notice that method one was the least effective and that the third method was the best. However, the differences in achievement scores are attributable not only to the effectiveness of the employed teaching methods, but also to the differences among the subjects. For the example, not all the subjects have similar aptitude for mathematics, and, moreover, the group means on the covariate (aptitude scores) for the first group is the lowest and the mean for the third group is the highest.
To control this source of variability is the main goal of the analysis of covariance. The analysis of covariance attempts to adjust the dependent variable (achievement scores) based on the covariate (the aptitude scores) by using a linear regression model.
Let us solve the analysis of covariance by the multiple regression, using two effect coding vectors, X1 and X2 and a covariate X, a continuous variable. The effect codes must sum to zero, however, their product has not to sum to zero, as in the case of orthogonal coding vectors. The effect codes are so named because using them as predictor variables yields regression weights that reflect effects of the treatments. The effect coding is characterized by coding the last group using -1 in all the vectors.
As the interaction among results of the experiment is not pronounced, we were able to drop the interaction term from the model and conceptualize the design as shown in the table below.
|
|
X |
X1 |
X2 |
Y |
|
S1 |
8 |
1 |
0 |
6 |
|
S2 |
4 |
1 |
0 |
3 |
|
S3 |
6 |
1 |
0 |
4 |
|
S4 |
7 |
0 |
1 |
8 |
|
S5 |
5 |
0 |
1 |
7 |
|
S6 |
8 |
0 |
1 |
10 |
|
S7 |
9 |
-1 |
-1 |
12 |
|
S8 |
10 |
-1 |
-1 |
12 |
|
S9 |
7 |
-1 |
-1 |
10 |
|
M |
7.11 |
0 |
0 |
8.00 |
Using a computer program, we obtained the following regression equation
The b weight associated with the covariate (.79) is the common slope. The estimate of the common slope can be used to compute the adjusted achievement mean for each group as
Adjusted Achievement Group Mean = Observed Achievement Group Mean - .79 (Aptitude Group Mean - Aptitude Grand Mean)
For the example
|
|
Adjusted Mean |
|
Method One |
5.21 |
|
Method Two |
8.68 |
|
Method Three |
10.11 |
For effect coding, each b is equal to the deviation of the adjusted mean of the group assigned 1's in the effect coded vector from the grand mean of the dependent variable, as shown below.
|
Effect of |
Difference |
Weight |
Comment |
|
Method One |
5.21 - 8.00 |
-2.79 |
B weight for effect coded vector X1 |
|
Method Two |
8.68 -8.00 |
.68 |
B weight for the effect coded vector X2 |
|
Method Three |
10.11 - 8.00 |
2.11 |
|
To test for significance in analysis of covariance, one has to adjust the achievement scores by using the aptitude scores first. The proportion of variance in the dependent variable accounted for by aptitude scores and teaching methods is .98576. The proportion of variance in the dependent variable accounted for by aptitude scores (the covariate) is .6766. The treatment effects after adjustment equal .30910 (.98576 - .6766 = .30910), and the error term equals .01424 (1 - .98576 = .01424). The F ratio for the treatments can be calculated as (.3091 / 3) / (.01424 / (9 - 3 - 1) with 3 and 5 degrees of freedom. For the data under scrutiny we can observe that, after adjusting for differences in subjects’ aptitudes (the covariate), the differences between the employed methods of instruction are significant (p < .05).