Cruise Scientific   ¨   Visual Statistics Studio   ¨   Foundations of Visual Statistics

   Based on Krus, D.J., & Ceurvorst, R.W.  (1979) Dominance, information, and hierahical scaling of variance space. Applied Psychological Measurement, 3, 515-527.

Dominance, Information, and Hierarchical Scaling of Variance Space

David J. Krus and Robert W. Ceurvorst
Arizona State University

 

Novel conceptualization of matrix subtraction can be used for computation of variance from all possible differences between data elements. Discussed is also the linkage of variance to information and visualization of hierarchical structures of data elements.

 

The concept of variance, coined in terms of all possible differences between values of a variable, was introduced by von Andrae (1872) and Helmert (1876) in a series of articles to Astronomische Nachtrichten. In the middle of the last century, using all possible differences between variables as foundation of statistical theory was contemplated by Kendall (1943, p. 47) who defined a coefficient, called here u2, as

(1)                                                        

For the discontinuous infinite case, the above equation can be written as

(2)                                 and for the finite case as

(3)                                                                       

where the summed term in the above equation is a vector of all possible differences between elements of variable x. Pointing out that the value of the u2 coefficient is “dependent on the spread of the variate-values among themselves and not on the deviations from some central value“(p. 47) Kendall shows that , concludes that the initial defining formula is “nothing but twice the variance” (p.47) and abandons the idea. One can only wonder which direction statistics could have taken if Kendall would realize that matrices of differences between all values of a variable reflect not only the information content of the variable, but also the hierarchical relationships between its elements.

 

The idea that analysis of variance can be linked with mathematical theory of information appeared shortly after Shannon and Weaver (1949) founded the discipline (Miller, & Madow, 1954; Garner & McGill, 1956). However, the initial interest in this relationship waned as expressing information in terms of base two logarithms made this index incompatible with the mainstream methods of data analysis.

 

In a similar vein, initial interest in the matrix algebra rendering of the analysis of variance designs, following publication of Horst’s (1963, p.271) Matrix Algebra for Social Scientists, subsided with the realization that Horst’s expression of variance in matrix algebra terms – as

 

(4)                                                              

lacks theoretically interesting interpretation of the I-11’/n term.

 

My interest in these old issues was aroused following observation of subtle inconsistency in conceptualization of basic matrix algebra operations, namely that textbooks on matrix algebra, routinely describing major and minor vector products, do not suggest analogical operations for the major and minor sums and differences of summands, minuends, and subtrahends. These operations are easy to imagine and are not discussed because most of their potential applications can be as well accomplished by unit vectors multiplications. However, on close scrutiny, matrix algebra operations of addition and subtraction (of vectors, not elements of vectors, of matrices, not elements of matrices) can be used for concise expression of several key algorithms of statistical theory and theory of probability. This paper is a re-write (2006) of my paper published with Robert Ceurvorst in 1979 where these issues were discussed in a seminal form.

variance AND DIFFERENCES CONTAINED BY THE DATA

Consider a vector x of n test scores. A major difference matrix  is defined as

(5)                  


Since the elements of  are symmetric, but with opposing signs along the zero-filled principal diagonal, the squaring of each element  would render this skew asymmetric matrix symmetric and thus 50% redundant. To eliminate this redundancy (i.e., in effect, to utilize each pair-wise difference only once), all negative elements in  are set equal to zero. If the elements of x are arranged in ascending or descending order, this will result in a triangular matrix , i.e.,  

(6)                                                    


If a matrix  is defined, where ,

(7)                                                   

the maximum likelihood (true) variance of x can be written as

(8)                                                                        

where 1 is a column vector of unities and 1' is its transpose. Using summation notation, Equation 8 is equivalent to

(9)                                                               

A formal proof that variance, as defined in Equation 9 equals the more common variance formula

(10)                                                                

was provided by Kendall (1942). This proof can be conceptualized as shown in the next section.

 

 

DIFFERENCES BETWEEN DATA ELEMENTS AND THEIR MEAN

 

 

Consider, for instance, a vector x' = [1 2 3 4 5] with mean  equal to 3 and true variance  equal to 2. The variance is typically computed as shown below.

 

(11)                                                          

 

The matrix D can be computed for this instance as

 

(12)                                    

 

The above matrix can be triangularized

 

(13)                                                           

and its corresponding matrix S computed as

(14)                                                   

The variance of x [1 2 3 4 5] can be computed for this instance by using Eq. 8 as 50/25 = 2.The matrix D contains information about all differences between the elements of x. It seems plausible to assume that this information can be also obtained from a matrix of all possible differences between the elements of x and its mean. Thus matrix M can be constructed as

(15)                 

Its corresponding matrix  can be obtained by squaring its elements,

(16)                        

illustrating why the variance of the vector x can be computed either as  (for the example 50/25 = 2), or as , (for the example 10/5 = 2). These historical antecedents of the conceptualization of variance help to understand its true meaning.

VARIANCE AND INFORMATION

 

Initially, the above conceptualization of variance may appear obtuse, however, it offers a possibility to link variance to measures of information not by defining information by Shannon’s equation H = log2 m where m is the number of equiprobable alternatives, as done by Garner & McGill (1956), but by defining information in terms of the 1-0 changes. This preserves the basic definition of bits of the information theory in a way that is congruent with the practice information is conceptualized within the statistical theory. The key relationship between the above skew symmetric matrix and the theory of information can be found within Guttman’s (1946) theory of implicational scales, as elaborated by Krus (1977). Let us express variable x by using binary units of information theory as a matrix of implicative relationships iX, for the current example

(17)                                                        

The row sums of the binary matrix iX are the values of the variable x [1 2 3 4 5]. The binary matrix iX can be also used to define the variance of the variable x, since

(18)                                                                 

 

and, for the example,

(19)                                                    

Matrix in Eq. 19 is identical to the matrix  in Eq. 13, suggesting a relationship between information, defined in terms of the 1-0 bits of the information theory

 

(20)                                                              

and variance, as used within the statistics and data analysis

(21)                                                               

HIERARCHICAL STRUCTURE OF DATA VECTORS

The directional differences (or dominance relations) among the row marginal referents of the vector x, for the example

                                                                              

also implies the hierarchical structure of this data vector, corresponding to the matrix  (Eq. 13), if conceptualized as a matrix adjacent to an ordered graph (Fig.1).


Fig 1. Dendrogram constructed from the skew symmetric matrix D, triangulated into its positive form , and conceptualized as an adjacency matrix to an ordered graph.

 

References

Andrae, von (1872). Über die Bestimmung des wahrscheinlichen Fehlers durch die gegebenen Differenzen vom gleich genauen Beobachtungen einer Unbekannten. Astronomische Nachrichten, vol. 84.

 

Garner, W. R. & McGill, W. J. (1956) The relation between information and variance analysis. Psychometrika, 21, 219-228.

 

Guttman, L. (1946) An approach for quantifying paired comparisons and rank order. Annals of the Mathematical Statistics, 17, 144-163.

 

Helmert, F.R. (1876). Die Berechnung des wahrscheinlichen Beobachtungsfehlers aus den ersten Potenzen der Differenzen gleichgenauer directer Beobachtungen. Astronomische Nachrichten, vol. 88.

 

Horst, P. (1963) Matrix algebra for social scientists. New York: Holt, Rinehart, and Winston.

 

Kendall, M. (1943) The Advanced Theory of Statistics. In Stuart, A., & Ord, J.K. (1987) Kendall’s Advanced Theory of Statistics, 5th Ed. London: Griffin.

 

Krus, D. J. (1977) Order analysis: An inferential model of dimensional analysis and scaling. Educational and Psychological Measurement, 37, 587-601.

 

Krus, D.J., & Bart, W.M. (1974) An ordering-theoretic method of multidimensional scaling of items. Educational and Psychological Measurement, 34, 525-535.

 

Krus, D.J., & Wilkinson, S.M. (1986) Matrix differencing as a concise expression of test variance. Educational and Psychological Measurement, 46, 179-183.

 

Miller, G.A., & Madow, W.G. (1954) On the maximum likelihood estimate of the Shannon-Wiener measure of information. Air Force Cambridge Research Center: Technical Report, 54-75, August 1954.

 

Shannon. C. E., & Weaver, W. (1949) The mathematical theory of communication. Urbana: University of Illinois Press.

 


Cruise Scientific   ¨   Visual Statistics Studio   ¨   Foundations of Visual Statistics