Single Classification Analysis of Variance

 

Partitioning of variance into its components is a central concept of statistical data analysis. The predicted and error variance components are not correlated and they are additive.

Variance Components in Different Forms

True Variance Components

In Obtained Scores Form

In Deviation Scores Form

Standard Variance Components

To compute the standard variance components, simply divide the above true variance components by the total variance component.

 TotalPredicted Error
True Variance
Standard Variance

If expressed as the standard variance components, they can be directly interpreted in terms of proportions of variance accounted for and not accounted for. Notice that eta square ( ) provides a measure of the strength of the relationship irrespective of whether it is linear or curvilinear. 

Extended and Expanded Variance Components

Aside of the obtained scores, deviation scores, and standard scores frameworks, there are two additional frameworks where data can be partitioned into extended ('sum of squares') and expanded variance components. Variance expressed as extended variance components (sums of squares) facilitates the use of a spreadsheet for the computation of analysis of variance components. Variance in its expanded form facilitates computation of variance within the context of matrix algebra.

Extended Variance Components

To compute extended variance components directly, simply multiply true variance components by n.   



Expanded Variance Components

To compute extended variance components directly, simply multiply true variance components by n square.   

Note that n is the total number of scores.

Suggestions

No matter in which framework, we can always standardize the obtained components by dividing them by the total variance component. The resulting standard variance can be shown below.

The standardizing variable is  the criterion variable (or the dependent variable) Y

 

 

Thus,

 

 

For linear relationship

 

 

For linear and nonlinear relationship

 

 

In this chapter, we will describe partitioning of variance into the extended variance components within the framework of the single classification analysis of variance.

 

Idealized Experimental Design

A prototype of a scientific experiment involves two groups of subjects, divided randomly into a control and an experimental group. Subjects are assumed to have no relationship to each other and different subjects are used for the different conditions of the experiment. An idealized outline of the arrangement of subjects prior to the onset of an experiment is presented in the following table.   

 

Y0

Y1

Y

Allen

1

 

1

Becky

2

 

2

Cathy

3

 

3

Debra

 

1

1

Edgar

 

2

2

Francis

 

3

3

M

2

2

2

s2

.67

.67

.67

 

Before the Experiment

Group Means and Variances

Initially, we have no reason to assume that the means and variances of both groups will differ. There is also no reason to assume that the means and variances of both groups combined will differ from those of the groups considered separately. 

 

 

The scores in the above table do not simulate the actual scores, but are, instead, hypothetical assumptions about the scores that could be expected in the absence of the experimental treatment.

Reaction Time Experiment

Using the same brand of wine, a control group coded as 0 will be given a glass of non-alcoholic wine while an experimental group coded as 1 will be given a glass of wine containing alcohol. Reaction time measurements in seconds are taken one hour after the wine was consumed.

Independent Measures Design

An experiment using different subjects for all conditions of the experiment is called an independent measures design. There are 3 subjects in each condition. The total number of subjects is 6.

O: Non-Alcoholic                 1: Alcohol

Allen                                     Debra    

Becky                                   Edgar     

Cathy                                    Francis 



Independent Variable

There is one independent variable which is manipulated by the researcher. The independent variable is consumption of alcohol with two levels: placebo (non-alcoholic wine) and wine containing alcohol.

Placebo Group

  O: Non-Alcoholic

Allen

Becky

Cathy

 

Experimental Group

  1:  Alcohol

Debra

Edgar

Francis


Dependent Variable

The dependent variable measured by the researcher is reaction time in seconds. Reaction time is measured at each level of the independent variable for every subject.

         Y0                                Y1

Allen   ? seconds     Debra   ? seconds

Becky ? seconds   Edgar     ? seconds

Cathy  ? seconds   Francis  ? seconds

Does consumption of alcohol have an effect on reaction time? 

After the Experiment

Changes in these idealized scores following the introduction of an experimental treatment are presented in the following table.   

 

Y0

Y1

Y

Allen

1

 

1

Becky

2

 

2

Cathy

3

 

3

Debra

 

1+2=3

3

Edgar

 

2+0=2

2

Francis

 

3+1=4

4

M

2

3

2.5

s2

.67

.67

.92

 

Total Variance

Notice that the variances of the control and experimental groups are the same (.67). However, the variance of the total group is increased to .92.

 

 

Different Group Means

Since the variances of the control and experimental groups did not change because of the experiment, the increase of variance for the total group should be due to the variance between the changed means.

Variance Due to Changed Group Means (Column Means)

The independent variable is consumption of alcohol with two levels: placebo (non-alcoholic wine) and wine containing alcohol.  Let us illustrate this conjecture, as shown in the following table, containing the means of the control and experimental groups together with their overall mean and variance.

Since the two groups have equal number of cases, we may use two group means to compute the overall mean and variance directly.

 

M

Y0

2

Y1

3

M

2.5

s2

.25

 

The overall mean is 2.5 and the variance due to difference between group means is .25. The remaining variance due to unknown factors can be computed as .92 - .25 = .67. The results of this experiment are summarized in the following table.      

Source of Variance

Variance Components

Between Means

.25

Residual

.67

Total

.92

Examine the Variance Components column. The variance due to group means (labeled as Between Means) is .25. The variance due to unknown factors (e.g., random errors) is .67. The total variance in the dependent variable Y is .92. 

Next, standardize the above variance components. The standard variance components are obtained by dividing all variance components by their total sum, for the example equal to .92. Thus, the between groups standard variance is .25/.92=.27. The residual standard variance is .67/.92=.73 and the total standard variance is .92/.92=1.  

 

Source of Variance

Variance Components

Standard Variance Components

Between Means

.25

.25 / .92 = .27

Residual

.67

.67 / .92 = .73

Total

.92

.92 / .92 = 1.00

 

Using the standard components of variance, the summary table can be conceptualized into a form that is more informative and easier to interpret. Examine the Standard Variance Components column. Approximately 27% of the total variance in the dependent variable was explained by the experimental treatments. However, 73% of the variance was not explained.


Regression on Categories

These variance components could also have been obtained by the regression on categories analysis, for the example, summarized as   

 

The lowercase sigma has two forms, and . Notice that  (also written as ) signifies standard variance. Examine the standard variances of Y, Y', and Y^. The experimental treatments explained 27 percent of the total variance of the criterion variable Y. The remaining 73 percent of variance were unexplained.

 

Source of Variance

Standard Variance Components

Between Means

.25 / .92 = .27

Residual

.67 / .92 = .73

Total

.92 / .92 = 1.00

 

In formal notation, the partitioning of variance table can be written as

 

Source of Variance

Variance Components

Standard Variance Components

Information

Residual

Total

 

Since

 

and

 

the above table can be also written as

 

Source of Variance

Standard Variance Components

Variance Components

 

Information

 

Residual

 

Total

 

These equivalencies are important for understanding the principles of analysis of variance and the mutual relationships between the principal models used for this task.

Single Classification Analysis of Variance

Single classification analysis of variance allows us to compare two or more independent groups. Are the population means different? To answer this question, we have to introduce the concept of the F ratio.

The F Ratio

We may express the F ratio in terms of the proportions of variance accounted for (in the cases of two groups, ) and not accounted for. The F ratio is computed as



where "k" is the number of groups and "n" is the total sample size.

Special Case of Two Groups

Examine the above formula. A simple analysis of variance for two independent groups (k  = 2) will yield an F ratio that is the same as the square of the t ratio for the same data.



The guaranteed fulfillment of the linearity assumption for the case of two groups, based on Euclid's postulate that a line is defined by two points, allows the use of the coefficient of determination and eta square interchangeably. 

The F Distribution

The F distribution is the theoretical sampling distribution used to evaluate the obtained F value.

The Normal Distribution, the t Distribution, and the F Distribution

As the normal distribution is a special case of the t-distribution, the t-distribution is, further, a special case of the F distribution. Like the t-distribution, the F distribution is based on the (Gamma) density function. The F distribution is related to the t distribution as

 

 

The t-distribution is related to the standard normal distribution as

 

 

A Ratio of Two Independent Variances

The original name of the F distribution was the inverted beta distribution. The inverted beta distribution was renamed by Snedecor to stress Fisher's contributions to the development of the computational techniques for the analysis of variance.

The original use of the F ratio was to test the assumption of homoscedascity, i.e., the approximate equality of variances of independent variables, each with its own degrees of freedom. The typical use of the F is within the context of the analysis of variance. Within traditional analysis of variance, the sums of squares are weighted by their respective degrees of freedom and called mean square. The degrees of freedom associated with the information term are k - 1. The degrees of freedom associated with the error term are n - k. The F ratio is then computed as a ratio of mean squares corresponding to the information and the error terms.

Degrees of Freedom

Like the distribution of t, the distribution of F is a family of distributions that vary with degrees of freedom. But with the F-ratio we must take into account both the degrees of freedom associated with the numerator and the degrees of freedom associated with the denominator. 

The Overall F Test 

The F test is often called the overall or omnibus F test. A significant F value tells you only that the population means are not all equal. To determine which means are significantly different from each other, you may perform post hoc comparisons. 

An ANOVA Table in Standard Variance Form 

Results of the reaction time experiment can be reported in the following table

Source of Variance

 
df

Standard Variance Components

Information

2-1=1

.27

Residual

6-2=4

.73

Total

5

1.00


1. Standard Variance Components

About 27% of the total variance in the dependent variable was explained by the experimental effect. About 73% of variance in the dependent variable was not explained.

2. Estimated Standard Variance Components

To make inferences based on the sample data, both information and residual standard variance components need to be corrected by their associated degrees of freedom. The estimated column standard variance is computed as .27 / 1 = .27. The estimated residual standard variance is computed as .73 / 4 = .18. 

 

Source of Variance

 
 df

Standard Variance Components

Estimated Standard Variance  Components
(corrected by df)

Information

2-1=1

.27

.27/1=.27

Residual

6-2=4

.73

.73/4=.18

Total

5

1.00

 

 

3. The F Ratio

The F ratio is computed as an information-error ratio corrected by the associated degrees of freedom. Thus, the F ratio equals .27/.18=1.5.

Source of Variance

 
 df

Standard Variance Components

Estimated Standard Variance  Components


F

Information

1

.27

.27

.27/.18=1.5

Residual

4

.73

.18

 

Total

5

1.00

 

 



4. Probability

Locate the position of the calculated F value in the F distribution with 1 degree of freedom associated with the numerator and 4 degrees of freedom associated with the denominator.

 

 

Approximately 29 percent of the time you would get an F ratio of 1.50 or more by chance. The finding is not statistically significant. 

Source of Variance

   
df

Standard Variance Components

Unbiased Standard Variance  Components

   
F

   
P

Information

1

.27

.27

1.5

.288

Residual

4

.73

.18

 

 

Total

5

1.00

 

 

 

 

5. Report the Results

A one-way analysis of variance was conducted to evaluate the effect of consumption of alcohol on reaction time. The independent variable included two levels: placebo and wine containing alcohol. The dependent variable was reaction time.
The ANOVA was not significant, F(1,4) = 1.5, p > .05. Consumption of alcohol did not have a significant effect on reaction time. About 27% of variance in reaction time was accounted for by the experimental treatment. 

Algebraic Substratum of the Traditional Analysis of Variance

The key to understanding the traditional approach to analysis of variance, using the concept of the sums of squares (called here also the extended variance components) is to realize that it is based on the variance formula in obtained scores form

 

 

algebraically manipulated as

 

 

and

 

 

Sums of Squares

The expression on the left of the equation represents the 'sums of squares.'

 

The sum of squares is simply the sum of squared deviation scores. The basic problem of analysis of variance is to compute variance in a form that is comparable. The incomparability of variance may occur when the coefficients of variance are based on different number of data elements. To avoid the division by nonidentical Ns, Fisher suggested that variance does not need to be computed; only the corresponding sums of squares need to be identified. 


 

Correction Term

The first term on the right side is the sum of squared obtained scores

 

and the last term on the right hand side of the above equation is the correction term.

 

 

Since we are using the obtained scores to compute the sum of squared deviation scores, we must subtract the correction term from our calculations.

 

Single Classification Analysis of Variance within the Microsoft Excel Framework

Using the same data from the reaction time experiment,

 

 

to compute the analysis of variance by using a spreadsheet.

First, enter data into the following data area and compute its column sums and the grand sum, as   

 

Data

Sums

Squares

Corrections

 

Data

1     3

2     2

3     4

 

 

 

Sums

6     9

15

 

 

Squares

 

 

 

 

Corrections

 

 

 

 

   

Next, square the sums, as shown in the following table.   

 

Data

Sums

Squares

Corrections

 

Data

1     3

2     2

3     4

 

 

 

Sums

6     9

15

 

 

Squares

36    81

 

225

 

Corrections

 

 

 

 

   

And divide the squared sums by their corresponding ns. For the example (36/3), (81/3) and (225/6), as  

 

Data

Sums

Squares

Corrections

 

Data

1     3

2     2

3     4

 

 

 

Sums

6     9

15

 

 

Squares

36   81

 

255

 

Corrections

12   27

 

 

37.5

 

The value in the lower right corner of the spreadsheet is called the correction term. To obtain the extended variance components, this correction term must be subtracted from the obtained intermediate values, computed as follows.

 

The Column Component

For the column variance component, add the corrected values for the data columns, for the example (12 + 27)

 

Data

Sums

Squares

Corrections

 

Data

1     3

2     2

3     4

 

 

 

Sums

6     9

15

 

 

Squares

36   81

 

255

 

Corrections

12   27

39

  1.5

37.5

 

And subtract the correction term from this sum, as 39 - 37.5, which equals 1.5. Enter this value to the above table between the values 39 and 37.5 and into the variance table below, as

Source of Variance

Extended Variance Components

Columns

1.5

Residual

 

Total

 

 

The Total Component

To obtain the total variance component, square and sum all elements of the data (12 + 22 +32 +32 +22 +42) and enter this sum (43) to the spreadsheet as   

 

Data

Sums

Squares

Corrections

 

Data

1     3

2     2

3     4

 

 

 

Sums

6     9

15

43

 

Squares

36   81

 

255

 

Corrections

12   27

39

1.5

37.5

 

Subtract the correction term from this value (43 - 37.5) and enter the result to the appropriate cell within the spreadsheet,   

 

Data

Sums

Squares

Corrections

 

Data

1     3

2     2

3     4

 

 

 

Sums

6     9

15

43

 

Squares

36   81

5.5

255

 

Corrections

12   27

39

1.5

37.5

 

as well as to the appropriate cell of the summary variance table   

Source of Variance

Extended Variance Components

Columns

1.5

Residual

 

Total

5.5

 

The Residual Component

The residual term for the extended variance components can be obtained by subtracting the column from the total variance component (5.5 - 1.5) and entered to the variance table, as 

Source of Variance

Extended Variance Components (Sum of Square)

Columns

1.5

Residual

4.0

Total

5.5

 

Dividing them by the extended total variance component can standardize the extended variance components. For the example, (1.5 / 5.5), (4.0 / 5.5), and (5 .5 / 5.5) equals .27, .73, and 1.0. The experimental treatment thus accounted for the 27 percent of the total component of variance.

 

Summary Table for the Single Classification Analysis of Variance

Within the Microsoft Excel Framework (the worksheet method), results of the analysis of variance are traditionally reported in a tabular form, such as outlined in the table below.   

Source of Variance

Degrees of Freedom

Sums of Squares

Mean Square


F


p 

column

k-1

SS ?

SS/df

MS/ MSRES

p ?

Residual

k(n-1)

SS ?

SS/df

 

 

Total

nk-1

SS ?

 

 

 

where 'k' is the the number of columns. `n` is the number of rows.

Mean Square

To compute the true variance, we divide the squared deviation scores (sum of squares) by the number of scores. ( That is, the variance is computed as the mean of the squared deviation scores.) Since our purpose here is to make inferences based on the sample data, we need to compute the unbiased variance instead. To compute the unbiased variance, we will divide the squared deviation scores (sum of squares) by the degrees of freedom. In the context of the ANOVA table, we call the unbiased (estimated) variance “mean square”. The mean square is used to estimate population variance.

The degrees of freedom are computed for the column source of variance as the number of columns of the data matrix minus one (k - 1) and for the total source of variance the total number of elements in the data matrix minus one (nk - 1). The number of degrees of freedom for the residual term can be obtained by subtracting degrees of freedom for columns from the total degrees of freedom.

The column mean square is 1.5 / 1 = 1.5 and the residual mean square is 4 / 4 = 1. The F ration can be computed as (column mean square) / (residual mean square) = 1.5 / 1 = 1.5.  


A Traditional ANOVA Table

Now we are ready to enter the results in the summary table. For the example, the traditional summary table of the analysis of variance is  

Source of Variance

Degrees of Freedom

Sums of Squares

Mean Square

   
F

   
Probability

Columns

1

1.5

1.5/1=1.5 

1.5/1=1.5

.288

Residual

4

4.0

4/4=1.0

 

 

Total

5

5.5

 

 

 

 

Disadvantages of the Traditional ANOVA Tables

Strength of the Relationship

Recall that the strength of a relationship as measured by the coefficient of determination is of primary importance; the significance of this relationship is of secondary importance. However, the traditional ANOVA summary table fails to provide the information directly. However, you may compute the coefficient of determination as 1.5/5.5 = .27. About 27% of variance in reaction time was accounted for by the experimental treatment. 

Computational Terms

The sums of squares and mean squares are not statistical terms but only computational ones, and, correspondingly, algebraic formulae using these terms do not convey statistical, conceptual ideas, but only computational algorithms.

 

Independent Measures t-Test as a Special Case of Single Classification Analysis of Variance

Let us reconsider the current example of the effect of the consumption of alcohol on reaction time within the framework of the independent measures t-test. Reaction times of the group of subject that received placebo (X0) and a group that received alcohol (X1) are recorded as the dependent variable Y in the following table. 

  

The parent vector X identifies subjects with respect who received placebo and who received alcohol. The coefficient of determination for variables X and Y is .27 and the coefficient of alienation is .73. The t-square

 

 

for 4 (6 - 2 = 4) degrees of freedom is computed as ( .27 / .73) 4 which equals 1.50. Note that for the special case of two groups

 

 

The t equals 1.22 and the probability associated with the t-ratio is .289.