Cruise Scientific        Visual Statistics Studio       Table of Contents

Information Model of Variance

If the numbers within a data matrix are not the same then they vary. The variability carries information about phenomena that the data describe. Helmert, Jordan, and Von Andrae developed the concept of variance as distances between values of a variable in a series of articles to Astronomische Nachrichten

The first of this series of articles was von Andrae's Ueber die Bestimmung des wahrscheinlichen Fehlers durch die gegebenen Differenzen vom gleich genauen Beobachtungen einer Unbekannten, published in 1872. Helmert, in his 1876 article in Astronomische Nachrichten, Die Berechnung des wahrscheinlichen Beobachtungsfehlers aus den ersten Potenzen der Differenzen gleichgenauer directer Beobachtungen elaborated the von Andrae's findings into the present day formula for computing variance.

Units of Information

In 1949, Shannon and Weaver's in their Mathematical Theory of Communication introduced the measure of information, using the binary digits, bits, as a unit. In 1956, Garner and McGill in The relation between information and variance analyses described variance in Shannon's units of information by using the logarithmic transformations. In 1979, Krus and Ceurvorst in Dominance, information, and hierarchical scaling of variance space described relationship between information and variance without the necessity of recourse to logarithmic transformations of data. The description of variance in this chapter is based on this paper, using the original Helmert's formulation of variance interpreted in terms of the contemporary information theory.

Variance of Binary Variables

Information theory defines the basic unit of information, (bit, contracted form binary digit) as a change from 0 to 1 or from 1 to 0. As discussed earlier, the variance of the binary variables can be computed by the ‘pq’ formula. The p is defined as the frequency of ones divided by n and q is defined as a frequency of zeroes divided by n. Let us consider a binary variable X [0 0 1 1 1] and compute its variance by the formula

as 3/5(2/5) which equals 6/25.

Construct Relational Space

To construct the informational model of variance for this variable, construct its 5(5) relational space, and write values of the variable X within its left and top margins, as 

All Possible Pair-wise Differences

Next, starting in first row, compute the differences between the first value of the variable X and the values of the variable X in the top row, as 

 

In the following step, compute differences between the second value of the variable X and the values of the variable X in the top row, as 

 

Continue to compute the differences between values of the variable X for its remaining row values, as 

 

The above matrix is a skew symmetric matrix. It means that its elements, symmetric along the principal diagonal, are the same magnitude, but of the opposing sign.

Redundant Part

The values on either side of the matrix, divided by the diagonal of zeroes into two triangular parts, can be predicted from the values of the opposing side. Thus, one of the sides is redundant and can be deleted. To make things simpler, we will delete the side filled with negative numbers, as

Compute the Variance

To compute the variance, count the number if bits in the above matrix and divide this number by the relational space. For our example, the number of bits of information is 6, and its relational space is 5(5). The variance of the variable X equals 6/25 = .24, a value, which can be readily verified by using any other formula for computation of variance.

Variance of Continuous Variables

For continuous variables, such as X [1 2 3 4 5], the information model of variance conceptualizes the variance as the average of the squared differences between the elements of a variable. This model can be constructed as follows.

Construct the Relational Space

First, prepare a 5(5) working space, and write values of the variable X at its left and top margins, as 

 

All Possible Pair-wise Differences

Next, starting in first row, compute the differences between 1 and the values of the variable X in the top row, as 

 

In the following step, compute differences between the second value of the variable X, listed in the second row as 2 and the values of the variable X in the top row, as

 

  Continue to compute the differences between values of the variable X for its remaining row values, as 

  Redundant Part

The above matrix is skew symmetric. This means that its elements, symmetric along the principal diagonal, have the same magnitude, but opposing signs. As this matrix is fully determined by its positive elements, its negative elements can be deleted, as 

Compute the Variance

The variance of the variable X [1 2 3 4 5], 2, equals the sum of squares of the elements of the above matrix (50), divided by the matrix' relational space 5(5). For this example, the variance of the variable X equals 50/25 = 2.

 

Variance in Pictures

The information model of variance involves two distinct algorithms. The first algorithm describes the operations leading to construction of a table of differences among data elements. The second algorithm specifies that these differences have to be squared, summed, and divided by their corresponding relational space to obtain the coefficient of variance.

All Possible Pair-wise Differences

In this section, we will discuss in detail the first algorithm. The computation of all possible differences between elements of a variable results in a matrix of differences between subjects (entities, row marginal referents) of that variable. These differences can also be interpreted as distances. Thus, this matrix of differences is similar to a table of distances between cities, as, e.g., 

Redundant Side

The above matrix of distances is symmetric, but could also have been reported as skew symmetric, 

 

since the distances between any two cities, but not the direction of the flight, are the same. However, it is not customary to report distances between cities in the rigorous skew symmetric form and thus the skew symmetric matrix is usually truncated, triangularized, as 

 Dendrogram

The matrix of all differences between elements of a variable can be also conceptualized as a matrix adjacent to a certain type of an ordered graph, dendrogram. The row marginal referents of the variable will then define the nodes of the graph and the computed differences the distances between them. For the variable X [1 2 3 4 5] defined in one of the preceding chapters as responses of Allen, Becky, Cathy, Debra, and Edgar to the question 'I like poetry' and its corresponding matrix of differences 

 

  the dendrogram can be constructed as  

 

reflecting the degree our five friends like poetry. Edgar indicated that he likes poetry very much. He is separated from Allen by four units, the 'distance' between Debra and Becky is 2 units. In the 'poetic hierarchy' of the above diagram, Edgar dominates Debra who dominates Cathy who, in turn, dominates both Becky and Allen. The construction of the dendrogram for the distances between cities would have been more complicated, since the cities listed in the table are not located on a straight line. How to design dengrograms for this nonlinear case is described elsewhere.

Advantages

In sum, the information model of variance reflects not only the information content of the variable, but also the hierarchical relationships between its elements.

 

The Mechanical Model of Variance

The above conceptualization of variance can be contrasted with the conceptualization of variance adopted by most statistical textbooks, interpreting variance by visualizing the squared deviation scores of the variance formula

 

  as squares. This conceptualization of variance can be illustrated on the variance of a variable X [1 2 3 6 8]. Its corresponding deviation scores x [-3 -2 -1 2 4], squared, are interpreted literally as squares around the mean, as shown below.

Dead Poets Society Meets Again

During one of the meetings of the Dead Poets Society in Café Appolinaire, Debra recalls a scene in the movie their club is named after where the English teacher, John Keating  (Robin Williams), discusses J. Evans Prichard' textbook on poetry. Prichard's textbook is titled Understanding Poetry and in the introduction Prichard proposes that poetry can be measured by poem's perfection and importance. The perfection and significance scales can capture the quality of a poem’s meter, rhyme, and figure of speech by the renown of its author.

By plotting the perfection on the horizontal axis, and its importance on the vertical axis, a poem's greatness can be measured as the area delimited by the rated poem's coordinates. Keating sketches importance and perfection of Byron's poems and Shakespeare's sonnets as shown below.

Debra, who is taking a course in statistics at the nearby community college, notes that Prichard's conceptualization of poem's greatness bears similarity to the mechanical model of variance. The notion that the true meaning of variance rests in squaring the deviation scores to obtain their corresponding areas appears to be implausible, although it is geometrically correct.

The idea that the total area of poem yields the measure of poet's greatness also appears implausible to Keating who instructs his students to tear out that chapter. John Keating also professes that poetry cannot be measured: 'Excrement. That's what I think of J. Evans Prichard; be gone J. Evans Prichard... this is a battle, a war, and casualties could be your hearts and souls. Armies of academicians going on measuring poetry... No matter what anybody tells you, words and ideas can change the world. We don't read and write poetry because it's cute. We read and write poetry because we are members of the human race; and the human race is filled with passion. Poetry, beauty, romance, love, those are what we stay alive for.'

Consider an alternative scenario, where poetic images are used to measure the intangible aspects of a college, perhaps the same college where the story of Keating's English class is taking place. Rating scales containing such poetic images as 'a treasured book,' 'a probing searchlight,' 'a lighthouse,' 'a compass needle,' 'a detailed map' can be used to capture the guidance aspects of a college. Images such as 'a secure fortress,' 'a protective armor,' 'a just judge' can measure the security dimension. Images such as 'a dam in a river,' 'an uncomfortable bed,' 'a tedious sermon,' 'an entailing net,' 'a hampering burden' can be used to portray the restrictive aspects of a college. Visions of 'vicious bully,' whipping post,' and of 'scolding mother' may describe college as a potential source of threat.

The four pillars of the Keating's alma mater 'Tradition, Honor, Discipline, Excellence,' can be measured as one can measure the four pillars of its counterculture: 'Travesty, Horrors, Decadence, Excrements.' The tradition of passing the light of knowledge by lightning a chain of candles, or its absence, is a part of these intangible aspects of a college that can be measured.

The main themes in professor Keating's class are the Life and the Death. Could death be rated by using scales such as a 'a shadowed doorway, a chilling frost, a dreamless space, misty abyss, leafless tree, an infinite ocean?' Perhaps one could measure love by metaphors such as 'a sorcerer's spell', 'a tongue of flames,' 'a stairway to paradise,' 'covenant,' 'churning sea,' 'a dainty box of sweets.' There are indeed wars going on, wars for your hearts and souls. You may decide to join the exciting and romantic campus of professor Keating or other of competing worlds. The world science offers is transcending, exacting, rational, combining Cartesian skepticism with positive methodologies and dreams of Leibniz with computer technology. It could be filled with passion for truth and zeal for discovery. However, if you decide to join, remember that this world must be tempered with humanism not to turn from a dream into a nightmare.

Prospectus

Trepanation of the skull and implantation of a microelectrode into a brain cell reveals an impulse (1: on) or its absence (0: off). The beacons of space probes send to earth a stream of on-off signals. Translated into ones and zeroes and enhanced, a reconstruction is made of the surface features of Jupiter's moons or Saturn's celestial rings. Even the inspection of a printed page with a magnifying glass reveals a pattern of black and white background dots, transferring the information via the eyes' retina into the occipital area of our brains. Data-analytic methods extract and convey information stored as variability between elements of data matrices. In this respect, data analysis is a scheme for extracting meaning, a way of telling a story.