Cruise Scientific      Visual Statistics Studio      Measurement and Scaling

Based on Krus, D.J. (1977) Order analysis: an inferential model of dimensional analysis and scaling. Educational and Psychological Measurement, 37, 587-601.



Order analysis: an inferential model of dimensional analysis and scaling
David J. Krus
Arizona State University

 

Abstract.- Description of an algorithm for simplification of complex dendrograms that are logically consistent and easier to interpret.

 

The dimensionality of a set of data can be derived from asymmetric, transitive, and connected relations between its elements, i.e., from those relations which define an order. In Bertrand Russell’s words (Russell, 1919, p. 20; 1903, p. 219):

 

Dimensions, in geometry, are a development of order.
All orders depend upon transitive asymmetrical relations.”

 

An order relation provides for an inference of dimensionality of a particular set of data and is at the same time characteristic of the family of implicative functions. Consider the possibility of an order structure reflecting a simple logical (and ultimately cognitive) structure. Among the basic logical structures is that based on implicative functions. Implication (from Lat. implicatic, interwoven) is the relation that holds between two propositions, or classes of propositions, in virtue of which one is logically deductible from the other. The truth values of the implication relationship between two propositions p and q can be defined as

 

(1)

 

 

 

or, using the 0,1 notation as

 

(2)

 

 

 

Subclasses of these chains of implications, connected by conjunctions

 

(3)

 

 

are various syllogisms, as, e.g., Russell’s peripatetic syllogism

 

All humans are mortal
Socrates is a human
therefore Socrates is mortal

 

or the classical Aristotelian syllogism

 

If all humans are mortal
and all Greeks are humans
then all Greeks are mortal.

 

However, here we are concerned only with simple chains of implications, such as if a implies b and b implies c then a implies c, etc. This implicative chaining is at the core of syllogistic reasoning as well as of the generation of a dimension. Consider a function of propositional calculus

 

(4)

 

 

 

catenating implicative relationships between arguments p, q, and r, as

 

(5)

 

 

 

Rectifying the p, q, and r truth values on the left side of Template (5) by using the truth values of conjunction in column 5 results in an implicatic (Guttman) scale as shown in Template(6).

 

(6)

 

 

 

The purpose of this article is to show how data matrices can be partitioned as to approximate implicatic scales and to demonstrate some of the key principles of data analysis within the framework of logical analysis of data. To begin our discussion, let us outline the algorithms for computation of the skew symmetric and skew asymmetric matrices.

 

Skew Symmetric and Skew Asymmetric Matrices

 

The order-independent subtraction of a data matrix X and its transpose X’ results in a skew symmetric matrix S indexing its row

 

(7)

 

 

 

and column

 

(8)

 

 

 

variance components. The order-dependent subtraction of a data matrix X and its transpose X’ results in a skew asymmetric matrix H indexing the row

 

(9)

 

 

 

and column

 

(10)

 

 

 

hierarchical structure of its marginal referents. Here the notational convention for subtraction within the context of matrix algebra is that subtraction of matrix elements is denoted by the minus sign in parentheses (-), order-independent subtraction of matrices by the minus sign and order-dependent subtraction of matrices by the minus sign in sharp brackets .

 

The skew symmetric matrices are incipient to computation of variance as used throughout the general linear model of data analysis where the variance is defined in terms of positive and negative differences between elements of data matrices. These differences have the same magnitudes. For instance, consider a data matrix X

 

(11)

 

 

 

An order-independent skew symmetric matrix  can be computed by Equation (8) as

 

(12)       

 

Here, the differences between data elements, say, 3 and 5, 2 and 4, and 7 and 6 are computed as -2 + (-2) + 1 and recorded as -3, say, below the diagonal of the skew symmetric variance matrix. Comparing the same elements in reverse order as 5 and 3, 4 and 2, and 6 and 7 is computed as 2 + 2 +(-1) and recorded as +3 above the diagonal of the skew symmetric matrix. As the negative elements of the skew symmetric matrices are redundant, these matrices are frequently triangulated into a skew-positive form , where

 

(13)

 

 

 

The skew asymmetric matrices reflect the hierarchical relationships between the marginal referents (attributes and entities, variables and subjects) of data matrices. An order-dependent skew asymmetric matrix  for the discussed instance of the data matrix X can be computed by Equation (10) as

 

(14)

 

 

 

The idea behind the calculation of the order-dependent skew asymmetric matrices is to count the differences for each direction separately and record these differences, for each direction, in the corresponding supra and infra-diagonal elements of the resulting matrix. For the “greater than” direction, the differences between data elements 3 and 5, 2 and 4, and 7 and 6 are 0 + 0 + 1. For the reverse direction, these differences are 2 + 2 + 0. Recording 4 above the diagonal and 1 below the diagonal, the magnitude of the elements in the skew-positive symmetric matrices can be recovered, for the example as 4 - 1, equal to 3.

 

Relationship between the Skew symmetric and Asymmetric matrices

 

Consider another example of a skew asymmetric matrix  

 

(15)

 

 

 

This matrix can be conceptualized as adjacent to an ordered graph with nodes defined by elements with symmetric elements, either above or below the diagonal, equal to zero. For the example, these elements in the above matrix were marked with an arrow, as

 

(16)

 

 

 

Connecting nodes marked with an arrow results in the ordered graph shown in Figure 1.

 

Figure 1. Initial hierarchical structure

 

The hierarchical structure in Figure 1 is relatively complex. It can be simplified by computing a matrix Z (cf., Krus & Kennedy, 1977) where

 

(17)

 

 

 

and its associated matrix P where

 

(18)

 

 

 

Within this probabilistic model of order-dependent skew asymmetric matrices conceptualized as adjacent to ordered graphs, a node of a dendrogram is defined if its corresponding  value reaches or exceeded a pre-specified level  bounded by the p values of .50 and .999. By changing the alpha value, a series of transformation matrices T, adjacent to ordered graphs can be computed where

 

(19)

 

 

 

For the example, the Z transformation matrix can be computed as

 

(20)

 

 

 

Its corresponding matrix of transitional probabilities P is

 

(21)

 

 

 

By changing the alpha level, the initial hierarchical structure reproduced in Figure 2

 

Figure 2. Hierarchical structure before transition

 

begins to approximate the linear orders at the alpha level equal to .80 (Figure 3)

 

Figure 3. Hierarchical structure at p > .80 level

 

and changes into a simple linear graph at the alpha level equal to .50 (Figure 4)

 

Figure 4. Hierarchical structure at p < .50 level

 

When the alpha level reaches the .50 level, the order-dependent asymmetric matrices H change into the  matrices, i.e., into the order-independent symmetric matrices S, triangulated into a positive form. These preliminary observations are important for understanding the algorithm for the order analysis, as outlined in the sections to follow.

 

Extraction of I-Scales

Order analysis decomposes dendrograms associated with data matrices that are often complex and difficult to interpret into its constituent I-Scales that are logically consistent and, if the data are not illogical (random), easy to interpret. For instance, consider a binary n by k matrix X where n designates the number of rows, subjects, or in general, entities, and k the number of columns, variables, items, or, in general, attributes, shown in (22).

 

(22)

 

 

 

The order-dependent skew asymmetric matrix  for attributes of the data matrix X in 22 can be computed by Equation (10) as

 

(23)

 

 

 

which can be conceptualized as an adjacency matrix to a dendrogram (24)

 

(24)

 

 

 

If more than one implicatic scale is contained by the data, order analysis of the matrix X results in a supermatrix X*

 

(25)

 

 

where the order-dependent skew asymmetric matrices H* corresponding to the submatrices of X* decompose the logically complex dendrogram into its component I-scales as shown, for the example, in Table 1.

 

Table 1. Schematic representation of extraction of I-scales from logically complex scales.


Dendrogram describing logically complex scale


 I

 

 

 


I-Scale W1



I-Scale W’2'








 

 

 

The alpha parameter

 

Associated with the data matrix X

 

(26)

 

 

 

is also the order-dependent skew asymmetric matrix  for entities of the data matrix X which can be constructed by using Equation (9) and translated to its associated Z and P matrices by Equations (17) and (18), as

 

(27)       

 

The alpha level is determined as the average of the elements of the matrix P greater or equal to .50. Thus a  matrix can be computed where

 

(28)

 

 

 

For the instance of the discussed matrix X, the  equals

.

(29)       

 

with alpha equal to

 

(30)

 

 

 

For the instance of the  matrix  which equals .83. At this level, the  matrix can be annotated with the downward arrows, indicating the location of the nodes of the associated dendrogram

 

(31)       

 

which is shown below.

 

(32)

 

 

 

Let us also construct a dendrogram at the alpha level greater or equal to .50

 

(33)

 

 

 

which reconfirms the presence of the two main branches in the data analyzed.

 

Wrapping the caepalic structures

 

By wrapping branches of dendrograms associated with the  matrices for the data matrix entities into caepalic (from L. caepa, onion) structures (using Read’s (1972, pp. 15-182) coding algorithm), we can isolate subsets of subjects who hold coherent opinions about the measured attributes. For the instance of the data analyzed, the dendrogram in (32) was used to extract the component implicatic scales for attributes of the binary data matrix X, in (22) by wrapping nodes d, a, I, b, f and h as

 

(34)

  

 

 

and nodes e, c, and g  as

 

(35)

 

 

 

Alternately, the second caepalic component could have been wrapped as

 

(36)

 

 

 

since the response patterns of subjects d (1 1 1 1 1) and h (0 0 0 0 0) do not alter the variance of this component.

 

Computer algorithms for wrapping caepalic components frequently look for successions of longest paths within a dendrogram by searching for nodes with the highest probability of having a path to its closest neighbor. In the graph theory terms, this is analogous to the “routing problem” i.e., to the problem of finding the path connecting the maximum number of nodes of the graph to be traversed in one run (cf. Bellman, 1968).

 

Composition of a supermatrix from the caepalic components

 

The extracted caepalic components can be used to determine the subsets of entities, defining a set of m  submatrices of the supermatrix X*

 

(37)

 

 

 

with its associated H* submatrices

 

(38)

 

 

 

defining their corresponding sets of dendrograms and I-Scales. For the example, the supermatrix X*

 

(39)

 

 

and its associated supermatrix  H*

 

(40)

 

 

 

reflect the structure of the first caepalic component  

 

(41)

 

 

 

and the second caepalic component  

 

(42)

 

 

 

Rearranging the attributes of the first submatrix of X* according to the caepalic component  

 

(43)

 

 

 

and attributes of the second submatrix of X* according to the caepalic component  

 

(44)

 

 

 

one may observe the characteristic arrangement of the response patterns of the I-Scales and their associated scales  and  isolated from the data analyzed.

 

Discussion

 

The algorithm for the probabilistic model of order analysis was developed from the deterministic model, described elsewhere (Krus and Bart, 1974). As compared to scaling methods providing for algebraic and geometric representations of isolated structures, order analysis suggests connecting links between the algebraic, geometric and logical structures of the analyzed data and allows for interpretation of resulting scales in logically ordered sequences of statements.

 

REFERENCES

 

Bellman, R. Control theory. In D. M. Messick (Ed.) Mathematical thinking in behavioral sciences. San Francisco: Freeman, 1968, 74-82.

Guttman, L. A basis for scaling qualitative data. American Sociological Review, 1944, 9, 139-150.

Krus, D. J., & Bart, W. M. (1974) An ordering theoretic method of multidimensional scaling of items. Educational and Psychological Measurement, 34, 525-535.

Krus, D. J. & Kennedy, P. H. (1977) Normal scaling of dominance matrices: The domain-referenced model. Educational and Psychological Measurement, 37, 189-193.

Krus, D. J., Weidman, J. C., & Bland, P. C. (1975) SSIE: Semi-projective scales of institutional evaluation. Research in Higher Education, 3, 131-138.

Read, R. C. (1972) The coding of various kinds of unlabeled trees. In R. C. Read (Ed,) Graph theory and computing. New York: Academic Press.

Russell, B. (1971) Principles of mathematics. New York: Simon and Schuster. (Originally published, 1919.)

Russell B. (1971) Introduction mathematical philosophy. New York: Simon and Schuster, (Originally published, 1919.)