Analysis of Covariance

Analysis of covariance, within Fisher's sum of squares conceptual model, is a complex method. In his Methods of Multivariate Analysis Hope comments on the analysis of covariance as follows:

The analysis of covariance is the most complicated of the standard statistical methods. It is complicated because it involves simultaneous employment of the concepts of analysis of variance and regression analysis. Its conceptual complexity is compounded by the arithmetic jungle, which sprouts on the pages of textbooks ...

Streamlining of the analysis of covariance can be accomplished by avoiding Fisher's sums of squares model altogether and, instead, using the coded regression analysis throughout.

Consider a typical research problem, suitable for the analysis of covariance. A researcher is interested in the effectiveness of three methods of presentation of material. Different subjects are used for all conditions of the experiment. Before the experiment, a pretest is given, followed by measurement of the amount of retained material. For a fictional example where X is the pretest, the covariate, and Y is the post-test, the data matrix may look as shown in the following table.   

 

GROUP I

 

GROUP II

 

GROUP III

 

X

Y

 

X

Y

 

X

Y

S1

1

1

S4

4

4

S7

7

7

S2

0

2

S5

3

5

S8

6

8

S3

2

3

S6

5

6

S9

8

9

M

1

2

 

4

5

 

7

8

.67

.67

 

.67

.67

 

.67

.67

Inspecting the means we notice that differences in the post-test scores (2, 5, and 8) may be attributable not only to differences among the methods but also to initial differences in the pretest scores (1, 4, and 7). In order to control the pre-existing differences, the effect of the covariate (pretest) has to be removed from the post-test scores by using the regression method. The question to answer is whether there is a significant difference among the group means after covarying the pretest.

Code Experimental Conditions

The independent variable is methods of presentation with three levels: method 1, method 2, and method 3. Using Helmert's procedure, the experiment can be coded as

X

X1

X2

Y

1

1

-1

1

0

1

-1

2

2

1

-1

3

4

-1

-1

4

3

-1

-1

5

5

-1

-1

6

7

0

2

7

6

0

2

8

8

0

2

9

 

Placing the Covariate First  

1. Enter the covariate variable X into the regression equation first. Predict the post-test from the covariate (pretest).

X

Y

1

1

0

2

2

3

4

4

3

5

5

6

7

7

6

8

8

9

 The resulting coefficient of determination is .90 as shown below.


Approximately 90% of the variance in the post-test scores can be explained by the pretest.

2. Next, enter the coding vectors, which represent methods of presentation (X1 and X2 ), into the regression equation. Predict the post-test scores from both pretest and methods of presentation. 

X

X1

X2

Y

1

1

-1

1

0

1

-1

2

2

1

-1

3

4

-1

-1

4

3

-1

-1

5

5

-1

-1

6

7

0

2

7

6

0

2

8

8

0

2

9

Use a computer program to run a multiple regression analysis. The resulting coefficient of multiple determination is .925.

 

 The covariate variable and the treatments account for 92.5% of variance in the post-test scores.

3. Increment

After subtracting  the effect of covariate (.925 -.9025 = .0225), only 2% of the variance in the posttest scores can be explained by methods of presentation. The corresponding summary table can be presented below. 

 

Source of Variance

Degrees of Freedom

Standard Variance Components

 

   F

 

Probability

Covariate 

1

.9025

.

Methods

2

.0225

   ?

 ?

Determination

3

.925

 

 

Alienation

5

.075

 

 

Total

8

1.00

 

 

 

Compute the F ratio for the treatment effects. 

 


The degrees of freedom associated with methods of presentation (the independent variable) are k - 1. The degrees of freedom associated with the error term are n - k - c. (n = total sample size, k = number of levels of the independent variable, and c = number of covariates.) 

Locate the position of the obtained F value in the F distribution with 2 df associated with the numerator (3-1=2) and 5 df associated with the denominator (9-3-1=5). We find that about 52 percent of the time you would get an F ratio of .75 or more by chance. 

Report the Results

A one-way analysis of covariance was conducted to evaluate the the effectiveness of three methods of presentation of material. The independent variable, method of presentation of materials, included three levels. The covariate was the pretest taken before the experiment began. The dependent variable was the post test taken after the experiment was completed. The ANCOVA was not significant. There is no significant difference among the group means after covarying the pretest scores, F(2,5) = .75, p = .519.  Only 2 % of variance in the post-test scores was accounted for by the methods of material presentation after controlling for the pretest scores.  

Placing the Covariate Last

The order of inclusion of predictors into the regression equation has a strong influence on the amount of variance accounted for. What will happen if we include the covariate last? 

First, enter two coding vectors (X1 and X2) into the regression equation. Predict the post-test scores from methods of presentation.

 

X1

X2

Y

1

-1

1

1

-1

2

1

-1

3

-1

-1

4

-1

-1

5

-1

-1

6

0

2

7

0

2

8

0

2

9

 

Use a computer program to run the a multiple regression analysis. The resulting coefficient of multiple determination is .90.

 

 About 90% of the variance in the post-test scores is explained by methods of presentation. 

Next, enter the covariate variable X. There will be three predictors (X, X1 and X2) in the regression model.

X1

X2

X

Y

1

-1

1

1

1

-1

0

2

1

-1

2

3

-1

-1

4

4

-1

-1

3

5

-1

-1

5

6

0

2

7

7

0

2

6

8

0

2

8

9

Use a computer program to run a multiple regression analysis. The results can be shown below.

 About 92.5% of variance in the posttest scores is accounted for by the covariate variable and the treatments. Including the covariate last makes it to account for only 2.5 % (.925 -.90 = .025) of the variance in the criterion variable.

Hierarchical multiple regression analysis extracts the variance of the variable included first and continue to build up the regression solution by adding portions of variances of other predictors, uncorrelated with predictors already included. Thus, the order of inclusion of variables into the regression solution is of crucial importance. Variables included earlier account for more variance than they would account for were they included at a later point in analysis.