Cruise Scientific       Visual Statistics Studio

Antecedents of the Concept of Variance
Variance of a variable can be computed by subtracting the mean from the values of the variable and squaring and averaging these values, as, for a variable X

 

For instance, variance of the variable X [0 1 2 3]

 

equals 1.25.

Matrices of differences

Using matrix algebra, variance can be ascertained by computing all possible differences between elements of a variable. A major difference of a variable X with its transpose results in a skew-symmetric matrix. Its elements describe all possible differences between values of the variable X.

 

Skew symmetric matrices are redundant, as the negative values can be guessed from the symmetric positive values. Removing this redundancy, a skew-asymmetric matrix can be defined as

 

Variance can be obtained from the sum of the squared elements of a skew-asymmetric matrix divided by the square of its order.

 

For the example (12 + 22 +12 + 32 + 22 + 12 ) / 42 = 1.25.

Differences between data elements and their mean

Before the computer era, computing a skew-asymmetric matrix of differences was an arduous task. To make it easier, let us consider what happens if instead of computing a major difference of a variable from its transpose we compute the major difference of a variable from its mean.

 

In the above matrix, the X - M difference is repeated n times in every row. This repetition is redundant, can be removed, and we can write

 

The result of the above operation is a matrix of deviation scores x, defined as

 

For the example

 

Since the sum of the deviation scores is always zero, a meaningful index can be obtained by the averaging the sum of its squares, for the example 5 / 4 which equals to 1.25, the same value as obtained above. Squaring the scores is congruent with the criterion of the least squares - these two concepts form the basis of the general linear model of statistics.

Retrospect

The concept of the least squares was developed by Laplace (1749-1827) in his work explaining the differences in motion of Jupiter and Saturn. The concept of variance in terms of all possible differences between values of a variable was introduced by von Andrae (1872) and Helmert (1876) in a series of articles to Astronomische Nachtrichten. The convention to use the Greek lowercase character σ for the standard deviation was coined by Karl Pearson in a series of articles published in Philosophical Transactions and Biometrika between 1896 and 1906.


______________________________

Notes

Variance and Information

Transformation of a special case of a variable X (0 1 2 3 … n) into its adjacent binary matrix underlying a perfect Guttman scale, for our example of the variable X as

 

and subtraction of the transpose of this matrix from itself

 

results in the same skew symmetric matrix as that of the major difference of the transpose of variable X from itself. Triangulation of the above matrix from its skew symmetric to its skew asymmetric form,

 

provides information about the number of 0-1 changes (bits) contained within the data.

 

Logical substratum of the Guttman scales

Analyzing the plenum of all possible responses for three binary items p, q, and r by the logical function

 

  

 

and rectifying the outcome, returns a data matrix adjacent to the Guttman scale [0 1 2 3]

 

 

as obtained in the previous section.

 

 

Subscript

 

Kendall’s u2 Coefficient

Using all possible differences between values of a variable as a foundation of statistical theory was contemplated by Kendall (1943, p. 47) who defined a coefficient, u, as

 

For the discontinuous infinite case, the above equation can be written as

 

and for the finite case as

 

where the summed term in the above equation is a vector of all possible differences between elements of variable x. Pointing out that the value of the u coefficient is dependent on the spread of the variate-values among themselves and not on the deviations from some central value, Kendall (1943, p.47) shows that u = 2σ, concludes that the initial defining formula is nothing but twice the variance, and abandons the idea. One can only wonder which direction statistics could have taken if Kendall would have realized that matrices of differences between all values of a variable are not just another way to compute variance, but are also adjacency matrices to the ordered graphs, reflecting not only the information content of a variable, but also the hierarchical relationships between its elements.

References

Andrae, von (1872). Über die Bestimmung des wahrscheinlichen Fehlers durch die gegebenen Differenzen vom gleich genauen Beobachtungen einer Unbekannten. Astronomische Nachrichten, vol. 84.

Helmert, F.R. (1876). Die Berechnung des wahrscheinlichen Beobachtungsfehlers aus den ersten Potenzen der Differenzen gleichgenauer directer Beobachtungen. Astronomische Nachrichten, vol. 88.

K  Kendall, M. (1943). The Advanced Theory of Statistics. In Stuart, A., & Ord, J.K. (1987) Kendall’s Advanced Theory of Statistics, 5th Ed. London: Griffin.

Krus, D. J. (2006) Variance and the differences between values of a variable. Journal of Visual Statistics. VisualStatistics.net (August 3, 2006)

Krus, D.J., & Ceurvorst, R.W. (1979) Dominance, information, and hierarchical scaling of variance space. Applied Psychological Measurement, 3, 515-527.

Laplace, P.S. (1799-1825) Mécanique Céleste. Vol. 1-5. Paris.

K  Press, W. H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (1992, 2nd Ed.). Numerical Recipes. Cambridge, MA: Cambridge University Press.

Shannon. C. E., & Weaver, W. (1949) The mathematical theory of communication. Urbana: University of Illinois Press