Using matrix algebra, variance can be ascertained by computing all possible differences between elements of a variable. A major difference of a variable X with its transpose results in a skew-symmetric matrix. Its elements describe all possible differences between values of the variable X.
Skew symmetric matrices are redundant, as the negative values can be guessed from the symmetric positive values. Removing this redundancy, a skew-asymmetric matrix can be defined as
Variance can be obtained from the sum of the squared elements of a skew-asymmetric matrix divided by the square of its order.
For the example (12 + 22 +12 + 32 + 22 + 12 ) / 42 = 1.25.
Before the computer era, computing a skew-asymmetric matrix of differences was an arduous task. To make it easier, let us consider what happens if instead of computing a major difference of a variable from its transpose we compute the major difference of a variable from its mean.
In the above matrix, the X - M difference is repeated n times in every row. This repetition is redundant, can be removed, and we can write
The result of the above operation is a matrix of deviation scores x, defined as
For the example
Since the sum of the deviation scores is always zero, a meaningful index can be obtained by the averaging the sum of its squares, for the example 5 / 4 which equals to 1.25, the same value as obtained above. Squaring the scores is congruent with the criterion of the least squares - these two concepts form the basis of the general linear model of statistics.
The concept of the least squares was developed by Laplace (1749-1827) in his work explaining the differences in motion of Jupiter and Saturn. The concept of variance in terms of all possible differences between values of a variable was introduced by von Andrae (1872) and Helmert (1876) in a series of articles to Astronomische Nachtrichten. The convention to use the Greek lowercase character σ for the standard deviation was coined by Karl Pearson in a series of articles published in Philosophical Transactions and Biometrika between 1896 and 1906.
______________________________
Notes
Transformation of a special case of a variable X (0 1 2 3 … n) into its adjacent binary matrix underlying a perfect Guttman scale, for our example of the variable X as
and subtraction of the transpose of this matrix from itself
results in the same skew symmetric matrix as that of the major difference of the transpose of variable X from itself. Triangulation of the above matrix from its skew symmetric to its skew asymmetric form,
provides information about the number of 0-1 changes (bits) contained within the data.
Analyzing the plenum of all possible responses for three binary items p, q, and r by the logical function
and rectifying the outcome, returns a data matrix adjacent to the Guttman scale [0 1 2 3]
as obtained in the previous section.
Subscript
Kendall’s u2 Coefficient
Using all possible differences between values of a
variable as a foundation of statistical theory was contemplated by Kendall
(1943, p. 47) who defined a coefficient, u,
as
For the discontinuous infinite case, the above equation can be written as
and for the finite case as
where the summed term in the above equation is a
vector of all possible differences between elements of variable x. Pointing out
that the value of the u
coefficient is dependent on the spread of the variate-values among
themselves and not on the deviations from some central value, Kendall
(1943, p.47) shows that u
= 2σ
,
concludes that the initial defining formula is nothing but twice the
variance, and abandons the idea. One can only wonder which direction
statistics could have taken if Kendall would have realized that matrices of
differences between all values of a variable are not just another way to
compute variance, but are also adjacency matrices to the ordered graphs, reflecting
not only the information content of a variable, but also the hierarchical
relationships between its elements.
Andrae,
von (1872). Über die Bestimmung des wahrscheinlichen Fehlers durch die
gegebenen Differenzen vom gleich genauen Beobachtungen einer Unbekannten. Astronomische
Nachrichten, vol. 84.
Helmert, F.R. (1876). Die Berechnung des wahrscheinlichen Beobachtungsfehlers aus den ersten Potenzen der Differenzen gleichgenauer directer Beobachtungen. Astronomische Nachrichten, vol. 88.
K Kendall, M. (1943). The Advanced Theory of Statistics. In Stuart, A., & Ord, J.K. (1987) Kendall’s Advanced Theory of Statistics, 5th Ed. London: Griffin.
Krus,
D.J., & Ceurvorst, R.W. (1979) Dominance, information, and hierarchical
scaling of variance space. Applied Psychological Measurement, 3, 515-527.
Laplace, P.S. (1799-1825) Mécanique Céleste. Vol. 1-5. Paris.
K Press, W. H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (1992, 2nd Ed.). Numerical Recipes. Cambridge, MA: Cambridge University Press.
K Shannon. C. E., & Weaver, W. (1949) The mathematical theory of communication. Urbana: University of Illinois Press