Single Classification Analysis of Variance

 

Partitioning of variance into its components is a central concept of statistical data analysis. The predicted and error variance components are not correlated and they are additive.

Variance Components in Different Forms

True Variance Components

In Obtained Scores Form

In Deviation Scores Form

Standard Variance Components

To compute the standard variance components, simply divide the above true variance components by the total variance component.

 TotalPredicted Error
True Variance
Standard Variance

If expressed as the standard variance components, they can be directly interpreted in terms of proportions of variance accounted for and not accounted for. Notice that eta square ( ) provides a measure of the strength of the relationship irrespective of whether it is linear or curvilinear. 

Extended and Expanded Variance Components

Aside of the obtained scores, deviation scores, and standard scores frameworks, there are two additional frameworks where data can be partitioned into extended ('sum of squares') and expanded variance components. Variance expressed as extended variance components (sums of squares) facilitates the use of a spreadsheet for the computation of analysis of variance components. Variance in its expanded form facilitates computation of variance within the context of matrix algebra.

Extended Variance Components

To compute extended variance components directly, simply multiply true variance components by n.   



Expanded Variance Components

To compute extended variance components directly, simply multiply true variance components by n square.   

Note that n is the total number of scores.

Suggestions

No matter in which framework, we can always standardize the obtained components by dividing them by the total variance component. The resulting standard variance can be shown below.

The standardizing variable is  the criterion variable (or the dependent variable) Y

 

 

Thus,

 

 

For linear relationship

 

 

For linear and nonlinear relationship

 

 

In this chapter, we will describe partitioning of variance into the extended variance components within the framework of the single classification analysis of variance.

 

Idealized Experimental Design

A prototype of a scientific experiment involves two groups of subjects, divided randomly into a control and an experimental group. Subjects are assumed to have no relationship to each other and different subjects are used for the different conditions of the experiment. An idealized outline of the arrangement of subjects prior to the onset of an experiment is presented in the following table.   

 

Y0

Y1

Y

Allen

1

 

1

Becky

2

 

2

Cathy

3

 

3

Debra

 

1

1

Edgar

 

2

2

Francis

 

3

3

M

2

2

2

s2

.67

.67

.67

 

Before the Experiment

Group Means and Variances

Initially, we have no reason to assume that the means and variances of both groups will differ. There is also no reason to assume that the means and variances of both groups combined will differ from those of the groups considered separately. 

 

 

The scores in the above table do not simulate the actual scores, but are, instead, hypothetical assumptions about the scores that could be expected in the absence of the experimental treatment.

Reaction Time Experiment

Using the same brand of wine, a control group coded as 0 will be given a glass of non-alcoholic wine while an experimental group coded as 1 will be given a glass of wine containing alcohol. Reaction time measurements in seconds are taken one hour after the wine was consumed.

Independent Measures Design

An experiment using different subjects for all conditions of the experiment is called an independent measures design. There are 3 subjects in each condition. The total number of subjects is 6.

O: Non-Alcoholic                 1: Alcohol

Allen                                     Debra    

Becky                                   Edgar     

Cathy                                    Francis 



Independent Variable

There is one independent variable which is manipulated by the researcher. The independent variable is consumption of alcohol with two levels: placebo (non-alcoholic wine) and wine containing alcohol.

Placebo Group

  O: Non-Alcoholic

Allen

Becky

Cathy

 

Experimental Group

  1:  Alcohol

Debra

Edgar

Francis


Dependent Variable

The dependent variable measured by the researcher is reaction time in seconds. Reaction time is measured at each level of the independent variable for every subject.

         Y0                                Y1

Allen   ? seconds     Debra   ? seconds

Becky ? seconds   Edgar     ? seconds

Cathy  ? seconds   Francis  ? seconds

Does consumption of alcohol have an effect on reaction time? 

After the Experiment

Changes in these idealized scores following the introduction of an experimental treatment are presented in the following table.   

 

Y0

Y1

Y

Allen

1

 

1

Becky

2

 

2

Cathy

3

 

3

Debra

 

1+2=3

3

Edgar

 

2+0=2

2

Francis

 

3+1=4

4

M

2

3

2.5

s2

.67

.67

.92

 

Total Variance

Notice that the variances of the control and experimental groups are the same (.67). However, the variance of the total group is increased to .92.

 

 

Different Group Means

Since the variances of the control and experimental groups did not change because of the experiment, the increase of variance for the total group should be due to the variance between the changed means.

Variance Due to Changed Group Means (Column Means)

The independent variable is consumption of alcohol with two levels: placebo (non-alcoholic wine) and wine containing alcohol.  Let us illustrate this conjecture, as shown in the following table, containing the means of the control and experimental groups together with their overall mean and variance.

Since the two groups have equal number of cases, we may use two group means to compute the overall mean and variance directly.

 

M

Y0

2

Y1

3

M

2.5

s2

.25

 

The overall mean is 2.5 and the variance due to difference between group means is .25. The remaining variance due to unknown factors can be computed as .92 - .25 = .67. The results of this experiment are summarized in the following table.      

Source of Variance

Variance Components

Between Means

.25

Residual

.67

Total

.92

Examine the Variance Components column. The variance due to group means (labeled as Between Means) is .25. The variance due to unknown factors (e.g., random errors) is .67. The total variance in the dependent variable Y is .92. 

Next, standardize the above variance components. The standard variance components are obtained by dividing all variance components by their total sum, for the example equal to .92. Thus, the between groups standard variance is .25/.92=.27. The residual standard variance is .67/.92=.73 and the total standard variance is .92/.92=1.  

 

Source of Variance

Variance Components

Standard Variance Components

Between Means

.25

.25 / .92 = .27

Residual

.67

.67 / .92 = .73

Total

.92

.92 / .92 = 1.00

 

Using the standard components of variance, the summary table can be conceptualized into a form that is more informative and easier to interpret. Examine the Standard Variance Components column. Approximately 27% of the total variance in the dependent variable was explained by the experimental treatments. However, 73% of the variance was not explained.


Regression on Categories

These variance components could also have been obtained by the regression on categories analysis, for the example, summarized as   

 

The lowercase sigma has two forms, and . Notice that  (also written as ) signifies standard variance. Examine the standard variances of Y, Y', and Y^. The experimental treatments explained 27 percent of the total variance of the criterion variable Y. The remaining 73 percent of variance were unexplained.

 

Source of Variance

Standard Variance Components

Between Means

.25 / .92 = .27

Residual

.67 / .92 = .73

Total

.92 / .92 = 1.00

 

In formal notation, the partitioning of variance table can be written as

 

Source of Variance

Variance Components

Standard Variance Components

Information

Residual

Total

 

Since

 

and

 

the above table can be also written as

 

Source of Variance

Standard Variance Components

Variance Components

 

Information

 

Residual

 

Total

 

These equivalencies are important for understanding the principles of analysis of variance and the mutual relationships between the principal models used for this task.

Single Classification Analysis of Variance

Single classification analysis of variance allows us to compare two or more independent groups. Are the population means different? To answer this question, we have to introduce the concept of the F ratio.

The F Ratio

We may express the F ratio in terms of the proportions of variance accounted for (in the cases of two groups, ) and not accounted for. The F ratio is computed as



where "k" is the number of groups and "n" is the total sample size.

Special Case of Two Groups

Examine the above formula. A simple analysis of variance for two independent groups (k  = 2) will yield an F ratio that is the same as the square of the t ratio for the same data.



The guaranteed fulfillment of the linearity assumption for the case of two groups, based on Euclid's postulate that a line is defined by two points, allows the use of the coefficient of determination and eta square interchangeably. 

The F Distribution

The F distribution is the theoretical sampling distribution used to evaluate the obtained F value.

The Normal Distribution, the t Distribution, and the F Distribution

As the normal distribution is a special case of the t-distribution, the t-distribution is, further, a special case of the F distribution. Like the t-distribution, the F distribution is based on the (Gamma) density function. The F distribution is related to the t distribution as

 

 

The t-distribution is related to the standard normal distribution as

 

 

A Ratio of Two Independent Variances

The original name of the F distribution was the inverted beta distribution. The inverted beta distribution was renamed by Snedecor to stress Fisher's contributions to the development of the computational techniques for the analysis of variance.

The original use of the F ratio was to test the assumption of homoscedascity, i.e., the approximate equality of variances of independent variables, each with its own degrees of freedom. The typical use of the F is within the context of the analysis of variance. Within traditional analysis of variance, the sums of squares are weighted by their respective degrees of freedom and called mean square. The degrees of freedom associated with the information term are k - 1. The degrees of freedom associated with the error term are n - k. The F ratio is then computed as a ratio of mean squares corresponding to the information and the error terms.

Degrees of Freedom

Like the distribution of t, the distribution of F is a family of distributions that vary with degrees of freedom. But with the F-ratio we must take into account both the degrees of freedom associated with the numerator and the degrees of freedom associated with the denominator. 

The Overall F Test 

The F test is often called the overall or omnibus F test. A significant F value tells you only that the population means are not all equal. To determine which means are significantly different from each other, you may perform post hoc comparisons. 

An ANOVA Table in Standard Variance Form 

Results of the reaction time experiment can be reported in the following table

Source of Variance

 
df

Standard Variance Components

Information

2-1=1

.27

Residual

6-2=4

.73

Total

5

1.00


1. Standard Variance Components

About 27% of the total variance in the dependent variable was explained by the experimental effect. About 73% of variance in the dependent variable was not explained.

2. Estimated Standard Variance Components

To make inferences based on the sample data, both information and residual standard variance components need to be corrected by their associated degrees of freedom. The estimated column standard variance is computed as .27 / 1 = .27. The estimated residual standard variance is computed as .73 / 4 = .18. 

 

Source of Variance

 
 df

Standard Variance Components

Estimated Standard Variance  Components
(corrected by df)

Information

2-1=1

.27

.27/1=.27

Residual

6-2=4

.73

.73/4=.18

Total

5

1.00

 

 

3. The F Ratio

The F ratio is computed as an information-error ratio corrected by the associated degrees of freedom. Thus, the F ratio equals .27/.18=1.5.

Source of Variance

 
 df

Standard Variance Components

Estimated Standard Variance  Components


F

Information

1