Measures of Variability

While the measures of central tendency convey information about the commonalties of measured properties, the measures of variability quantify the degree to which they differ. If values of a variable are not the same, they differ, and variability exists.

 

The measures of central tendency should be complemented by measures of variability for the same reason objective descriptions of events should contain accounts of both centripetal and centrifugal forces, of consenting and opposing opinions, of shared and conflicting views. The variability of data is measured by the statistics called variance and symbolized by the squared Greek letter sigma written in lowercase.

 

Within astronomy, the coefficient of variance was formulated by von Andrae, Helmert, and Jordan between 1872 and 1876 in terms of differences between values of a variable. Karl Pearson introduced this concept into statistics in a series of articles published in Philosophical Transactions and Biometrika between 1896 and 1906. Pearson also coined the usage of the lowercase Greek letter sigma squared to signify the variance. Variance is the central concept of statistical theory.

 

Variance in terms of differences between values of a variable

 

Consider all differences between values of a variable X [1 2 3 4 5], as shown below

 

 

 

 

The above matrix of differences is called the skew symmetric matrix, since to each positive element corresponds a negative element of the same magnitude. If we are interested only in distances between the values of the variable X and not in the direction of these distances, the above matrix can be triangularized as

 

 

 

 

and the variance can be defined as an average of these squared distances.

 

 

 

 

For the example, the sum of values in the above matrix is 50 and the average of these values is 50/25. The variance of the variable X [1 2 3 4 5] equals 2.

 

Variance in terms of differences between values of a variable and the mean of the variable

 

Remember that the Pythagoreans defined the mean as a quantity that exceeds the smaller value by the same amount as the larger value exceeds the mean. Thus it seems plausible to assume that we would get the same variance of a variable if we would average all squared differences between its values and their mean. For the example, the differences between values of the variable X [1 2 3 4 5] and its mean [3] are

 

 

 

 

Since the arithmetic mean is a constant and not a variable, columns of the above matrix are the same. The value of the variance thus should be the same if we consider only a single column of the above matrix

 

 

 

 

 

and average its squared scores

 

 

 

 

To obtain the same value of the variance (10 / 5 = 2) which we obtained by averaging all differences between the values of the variable X (50 / 25 = 2).

Definition of Variance

The above table can be simplified if we define deviation scores (deviations from the arithmetic mean) as

 

 

 

 

Note the subtle difference in the above formula, which is a prototype of the transformation formulae. As contrasted with formulae for the computation of statistical indices such as the arithmetic mean, a single number, the above formula defines the changes in values of a variable throughout its whole range. Using the above definition of the deviation scores, we can rewrite the above table as

 

 

 

 

Thus variance can be defined as

 

 

 

 

and its square root, called the standard deviation, as

 

 

 

 

For the example, the variance equals 2 and the standard deviation equals 1.41.

 

Deviation Scores and Their Properties

In statistics, values of variables within the data set are called the obtained or raw scores. Subtraction of the mean transforms the obtained scores into deviation scores. This linear transformation preserves all properties of the original set of raw scores save the mean. Deviation scores sum to zero and thus their mean is always zero. The transformation to deviation scores changes obtained scores below the mean into negative numbers, scores above the mean into positive numbers, and anchors the mean of the distribution at the zero point.  

       The concept of deviation scores from the arithmetic mean can be contrasted with deviation scores defined in terms of distances from the absolute zero, such as typified by the Kelvin’s scale of temperature. It has definitive merits in social sciences where it is typically difficult to find an absolute zero point of most measured properties. Is there such a thing as zero dominance, zero love, or zero hate? These are things to contemplate.

       Transformation of the obtained scores to the deviation scores is one of the essential procedures of the data analysis. Typically, the arithmetic mean is computed. After the mean is known, it is subtracted (removed) from the data and the next most important index, the variance, is computed.

Variance Computed from Obtained Scores

In a preceding section, we introduced the definitional formula for variance as

 

 

 

where

 

 

 

 

Substituting the right side of the above equation into the formula for the computation of variance results in

 

 

 

 

This formula can be expanded as a familiar algebraic formula

 

 

 

 

where in lieu of a and b, use X and M. Also, note that the summation signs are associated with the expanded terms as

 

 

 

 

The middle term on the right-hand side of the above equation contains two means. Once, the mean is written as a sum of the obtained scores divided by n, the other time as M. Substituting M for the sum of the obtained scores divided by n term simplifies the above expression as

 

 

 

 

The formula for the arithmetic mean, written in complete notation, is

 

 

 

 

The last term of the equation we try to simplify, written in complete notation as

 

 

 

differs from the formula for the arithmetic mean in one important respect. While the formula for the arithmetic mean states that the variable X should be summed and averaged over its whole range, the above formula states that the square of the mean, a constant value, should be summed and averaged over the whole range of the variable X. This means, in terms of the current example, that the square of the mean, a constant number, should be summed five times, as 9 + 9 + 9 + 9 + 9. This operation can be simplified as 5(9). Thus, we can write the above expression as

 

 

 

 

and the equation

 

 

 

 

can be simplified as

 

 

 

 

Thus, the variance can be computed directly from the obtained scores as

 

 

 

which can be also written as

 

 

 

 

For the example of the variable X [1 2 3 4 5], the computation of variance from the data obtained from five subjects  is outlined as

 

 

 

 

The variance calculated directly from the obtained scores for the variable X as 11  9 which equals to 2, the value identical to that obtained by computing variance by using the deviation score formula.

True and Unbiased Variance

As you may observe by looking at the keyboard of a typical scientific hand calculator, there are two kinds of variance. The variance defined as

 

 

 

is called the population, or true variance and can be contrasted with the variance defined as

 

 

 

 

called the sample or unbiased variance estimate. The computations of both true and unbiased variance coefficients are illustrated below.

 

 

 

 

For the example of variable X [1 2 3 4 5], the variance was computed either as 2.0 or as 2.5, depending on whether the sum of squared deviation scores (10) was divided by n, for the example equal to 5 or by n-1 that is equal to 4. In the former case, the obtained variance is the true variance (10/5 = 2.0). In the latter case, the variance (10/4 = 2.5) is 'unbiased'.

Degrees of Freedom

The n-1 term in the denominator of the unbiased variance formula is referred to as degrees of freedom, signified as df or , (Greek letter nu). The notion of the degrees of freedom is related to the concept of the random normal variable. To illustrate the notion of the random normal variable, let us consider the numbers 1, 2, 3, 4, 5, assigned to Allen, Beth, Cathy, Debra, and Edgar at the beginning of our discussion. No one ever actually asked these five subjects whether they liked poetry. In fact, these subjects are purely fictitious and the assignment of the numbers 1, 2, 3, 4, and 5 to each subject was done because of computational convenience. Don't be misled by the ordinality of the numbers 1, 2, 3, 4, 5. In a recent lottery, the winning numbers were 3, 4, 5, 6, 7, 17 and 34. The point here is that we were free to select these numbers at will, and, in this instance, the number of degrees of freedom we had equaled n, the number of cases. Now, imagine that this book was written by using only deviation scores. The authors of this hypothetical book could assign numbers 1, 2, 3, 4 to Allen, Beth, Cathy, and Debra. So far, they were free to assign to these fictitious subjects any numbers they wished. However, in the case of Edgar, they would be no longer free to assign to him any number they wished. They would have to assign to him the number -10, since the deviation scores must sum to zero. In Edgar's case, the authors are no longer free to assign any number they wish. After selecting the first four numbers as 1 2 3 and 4, the last number has to be -10 in order for the total sum to equal zero. Thus, the number of degrees of freedom associated with the deviation scores is n-1 and, for this example, equals 4.

Unbiased Variance Computed from Obtained Scores

In the previous sections, the arithmetic mean was defined as

 

 

 

 

and the true variance as

 

 

 

 

Substituting right side of equation for the computation of mean to the above formula results in

 

 

 

 

The above formula can be changed into a formula expressing the unbiased variance by substituting the n-1 expression for one of the two ns in the denominator, as

 

 

 

 

The above formula is the prototype of the “sum-of-squares  mean squares” approach to statistics, as this approach stresses that all what is needed are the sums and the sums of the squared values of the variable. For the example of the variable X [1 2 3 4 5], squares of its values can be computed as

 

 

 

 

For the example, the unbiased variance can be computed by the above formula as 5 times 55 (275), minus 15 squared (225), divided by 5 times 4 (20), i.e., (275  225)/20 which equals 2.5. However, this initial simplification of the computational operations results in obfuscation of statistical concepts and hinders the understanding of the meaning of the statistical analysis of data..

Translations between True and Unbiased Variance

What is the relationship between the true and unbiased coefficients of variance? To answer this question, let us form the ratio of unbiased and true variances as

 

 

 

 

The above formula can be simplified as

 

 

 

 

Thus, the translation from the true to the unbiased form can be accomplished as

 

 

 

 

and the translation of the unbiased variance into its true form as

 

 

 

 

For the current example, the unbiased variance (2.5) can be obtained from the true variance (2) as (5/4)2 = 2.5 and the true variance can be obtained from the unbiased variance as (4/5)2.5 = 2. The ratia of degrees of freedom to n and of n to degrees of freedom, translating variance from the unbiased form to the true form and vice versa, are frequently encountered in statistical analyses.

 

Summary

The variance formulae summarized here are of fundamental importance and will be repeatedly encountered in the course of our narrative. Formulae describing the transformation of variables from the obtained into deviation scores and back from the deviation scores to the obtained scores are

 

 

Deviation Scores

Obtained Scores

Obtained Scores

 

 

Deviation Scores

 

 

 

Formulae for the true variance expressed in the obtained and deviation scores are summarized as

 

 

Deviation Scores

Obtained Scores

True Variance

            
 

 

            

 

 

 

Formulae for the unbiased variance expressed in the obtained and deviation scores are summarized as

 

 

Deviation Scores

Obtained Scores

Unbiased Variance

            
 

 

            

 

 

 

Inspection of the above formulae shows that the introduction of the degrees of freedom complicates the variance expressions. Parsimonious presentations of statistical theory use the true variance throughout,

 

 


True Variance


Unbiased variance

True Variance

 

 

Unbiased Variance

 

 

 

translating the true variance into an unbiased form

only when necessary.