Single Classification Analysis
of Variance
Partitioning of variance into its components is a central concept of statistical data analysis. The variance components can be computed for the obtained and deviation scores, or for the standard scores. The variance of the data in the obtained or deviation score forms can take on any value, however the variances of the obtained and deviation scores are the same. The variance of the standard scores is one. Under certain conditions the variance components are additive and if expressed as the standard variance components, they can be directly interpreted in terms of proportions of variance accounted for. Aside of the obtained scores, deviation scores, and standard scores frameworks, there are two additional frameworks where data can be partitioned into extended ('sum of squares') and expanded data components. No matter in which framework we describe the partitioning of variance into its additive (orthogonal) components, we can always standardize the obtained components by dividing them by the total variance component. In this chapter, we will describe partitioning of variance into the extended variance components within the framework of the single classification (one-way) analysis of variance.
A prototype of a scientific experiment involves two groups of subjects, divided randomly into a control and an experimental group. Subjects are assumed to have no relationship to each other and different subjects are used for the different conditions of the experiment. An idealized outline of the arrangement of subjects prior to the onset of an experiment is presented in the following table.
|
|
Y0 |
Y1 |
Y |
|
Allen |
1 |
|
1 |
|
Becky |
2 |
|
2 |
|
Cathy |
3 |
|
3 |
|
Debra |
|
1 |
1 |
|
Edgar |
|
2 |
2 |
|
Francis |
|
3 |
3 |
|
M |
2 |
2 |
2 |
|
σ2 |
.67 |
.67 |
.67 |
Initially, we have no reason to assume that the means and variances of both groups will differ. There is also no reason to assume that the means and variances of both groups combined will differ from those of the groups considered separately. The scores in the above table do not simulate the actual scores, but are, instead, hypothetical assumptions about the scores that could be expected in the absence of the experimental treatment. Changes in these idealized scores following the introduction of an experimental treatment are presented in the following table.
|
|
Y0 |
Y1 |
Y |
|
Allen |
1 |
|
1 |
|
Becky |
2 |
|
2 |
|
Cathy |
3 |
|
3 |
|
Debra |
|
1+2=3 |
3 |
|
Edgar |
|
2+0=2 |
2 |
|
Francis |
|
3+1=4 |
4 |
|
M |
2 |
3 |
2.5 |
|
σ2 |
.67 |
.67 |
.92 |
Since the variances of the control and experimental groups did not change because of the experiment, the increase of variance for the total group should be due to the variance between the changed means.
Let us illustrate this conjecture, as shown in the following table, containing the means of the control and experimental groups together with their mean and variance.
|
|
M |
m |
m2 |
|
Y0 |
2 |
-.5 |
.25 |
|
Y1 |
3 |
.5 |
.25 |
|
M |
2.5 |
0 |
.25 |
|
σ2 |
|
|
.25 |
The remaining variance should be the variance among subjects. Thus, we can summarize the results of this experiment as in the following table
|
Source of Variance |
Variance Components |
Standard Variance Components |
|
Between Means |
.25 |
.27 |
|
Within Subjects |
.67 |
.73 |
|
Total |
.92 |
1.00 |
The standard variance components were obtained by dividing all variance components by their total sum, for the example equal to .92.
These variance components could also have been obtained by the regression analysis, for the example, summarized as
The experimental treatment thus accounted for 27 percent of variance. The remaining 73 percent of variance are due to the variability between the subjects due to some other factors. In formal notation, the partitioning of variance table can be written as
Since
and
the above table can be also written as
These equivalencies are important for understanding the principles of analysis of variance and the mutual relationships between the principal models used for this task.
The key to understanding the traditional approach to the analysis of variance, extracting the extended components of variance (also called the sums of squares), is to realize that this solution is based on the variance formula
algebraically manipulated as
and
The expression on the left of the above equation represents the 'sums of squares.' The sum of squares is simply the sum of squared deviation scores. The first term on the right side is the sum of squared obtained scores and the last term on the right hand side of the above equation is the correction term. Since we are using the obtained scores to compute the sum of squared deviation scores, we must subtract the correction term from our calculations.
The basic problem of analysis of variance is to compute variance in a form that is comparable. The incomparability of variance may occur when the coefficients of variance are based on different number of data elements and the 'sum of squares' method circumvents this difficulty.
The above example is a prototype of a scientific experiment, involving two groups of subjects, divided randomly into a control and an experimental group. Different subjects are used for the different conditions of the experiment.
Suppose our example concerns the effect of the consumption of alcohol on reaction time. Using a placebo and an alcoholic beverage, the control group is given placebo while the experimental group is given the alcoholic beverage. Reaction time measurements, taken one hour after the placebo and alcohol were consumed, are presented in the following table.
To compute the analysis of variance by using a spreadsheet, enter data into the data area and compute its column sums and the grand sum, as
|
|
Data |
Sums |
Squares |
Corrections |
|
Data |
1 3 2 2 3 4 |
|
|
|
|
Sums |
6 9 |
15 |
|
|
|
Squares |
|
|
|
|
|
Corrections |
|
|
|
|
Next, square the sums, as shown in the following table.
|
|
Data |
Sums |
Squares |
Corrections |
|
Data |
1 3 2 2 3 4 |
|
|
|
|
Sums |
6 9 |
15 |
|
|
|
Squares |
36 81 |
|
255 |
|
|
Corrections |
|
|
|
|
And divide the squared sums by their corresponding ns. For the example (36/3), (81/3) and (255/6), as
|
|
Data |
Sums |
Squares |
Corrections |
|
Data |
1 3 2 2 3 4 |
|
|
|
|
Sums |
6 9 |
15 |
|
|
|
Squares |
36 81 |
|
255 |
|
|
Corrections |
12 27 |
|
|
37.5 |
The value in the lower right corner of the spreadsheet is called the correction term. To obtain the extended variance components, this correction term must be subtracted from the obtained intermediate values, computed as follows.
For the column variance component, add the corrected values for the data columns, for the example (12 + 27)
|
|
Data |
Sums |
Squares |
Corrections |
|
Data |
1 3 2 2 3 4 |
|
|
|
|
Sums |
6 9 |
15 |
|
|
|
Squares |
36 81 |
|
255 |
|
|
Corrections |
12 27 |
39 |
|
37.5 |
And subtract the correction term from this sum, as 39 - 37.5, which equals 1.5. Enter this value to the above table between the values 39 and 37.5 and into the variance table, as
|
Source of Variance |
Variance Components |
Standard Variance Components |
Extended Variance Components |
|
Columns |
.25 |
.27 |
1.5 |
|
Residual |
.67 |
.73 |
|
|
Total |
.92 |
1.00 |
|
To obtain the total variance component, square and sum all elements of the data (12 + 22 +32 +32 +22 +42) and enter this sum (43) to the spreadsheet as
|
|
Data |
Sums |
Squares |
Corrections |
|
Data |
1 3 2 2 3 4 |
|
|
|
|
Sums |
6 9 |
15 |
43 |
|
|
Squares |
36 81 |
|
255 |
|
|
Corrections |
12 27 |
39 |
1.5 |
37.5 |
Subtract the correction term from this value (43 - 37.5) and enter the result to the appropriate cell within the spreadsheet,
|
|
Data |
Sums |
Squares |
Corrections |
|
Data |
1 3 2 2 3 4 |
|
|
|
|
Sums |
6 9 |
15 |
43 |
|
|
Squares |
36 81 |
5.5 |
255 |
|
|
Corrections |
12 27 |
39 |
1.5 |
37.5 |
as well as to the appropriate cell of the summary variance table
|
Source of Variance |
Variance Components |
Standard Variance Components |
Extended Variance Components |
|
Columns |
.25 |
.27 |
1.5 |
|
Residual |
.67 |
.73 |
|
|
Total |
.92 |
1.00 |
5.5 |
The residual term for the extended variance components can be obtained by subtracting the column from the total variance component (5.5 - 1.5) and entered to the variance table, as
|
Source of Variance |
Variance Components |
Standard Variance Components |
Extended Variance Components |
|
Columns |
.25 |
.27 |
1.5 |
|
Residual |
.67 |
.73 |
4.0 |
|
Total |
.92 |
1.00 |
5.5 |
Dividing them by the extended total variance component can standardize the extended variance components. For the example, (1.5 / 5.5), (4.0 / 5.5), and (5 .5 / 5.5) equals .27, .73, and 1.0. The experimental treatment thus accounted for the 27 percent of the total component of variance.
Results of the analyses of variance are traditionally reported in a tabular form, such as outlined in the table below.
|
Source of Variance |
Degrees of Freedom |
Sums of Squares |
Mean Square |
F |
Probability |
|
Columns |
k-1 |
SS ? |
SS / df |
MS / MSRES |
p ? |
|
Residual |
k(n-1) |
SS ? |
SS / df |
|
|
|
Total |
nk-1 |
SS ? |
|
|
|
The degrees of freedom are computed for the column source of variance as the number of columns of the data matrix minus one and for the total source of variance the total number of elements in the data matrix minus one. The number of degrees of freedom for the residual term can be obtained either by subtracting degrees of freedom for rows and columns from the total degrees of freedom. For the example, the traditional summary table of the analysis of variance is
|
Source of Variance |
Degrees of Freedom |
Sums of Squares |
Mean Square |
F |
Probability |
|
Columns |
1 |
1.5 |
1.5 |
1.5 |
.1440 |
|
Residual |
4 |
4.0 |
1.0 |
|
|
|
Total |
5 |
5.5 |
|
|
|
Using the standard components of variance, the above table can be conceptualized
|
Source of Variance |
df |
Standard Variance Components |
Unbiased Variance Components |
F |
Probability |
|
Columns |
1 |
.27 |
.27 |
1.5 |
.1440 |
|
Residual |
4 |
.73 |
.18 |
|
|
|
Total |
5 |
1.00 |
|
|
|
into a form that is more informative and easier to interpret.
Let us reconsider the current example of the effect of the consumption of alcohol on reaction time within the framework of the independent measures t-test. Reaction times of the group of subject that received placebo (X0) and a group that received alcohol (X1) are recorded as the dependent variable Y in the following table.
The coefficient of determination for variables X and Y is .27 and the coefficient of alienation is .73. The t-square

for 3 degrees of freedom is computed as ( .27 / .73) 3 which equals 1.50. Note that for the special case of two groups
The t equals 1.22 and the probability associated with the t-ratio is .2880.