Cruise Scientific        Visual Statistics Studio       Table of Contents

Preliminary Considerations

Visual statistics encompasses traditional statistics and modern computer assisted data analysis. It is modern continuation of the classical epistemology, providing foundations for a general theory of scientific inquiry. It begins with description of quantities, their relationships, and structure.

Data Matrices

        A data matrix is a rectangular arrangement of numbers symbolizing properties of phenomena under scrutiny. These numbers are located at intersections of data matrix's rows and columns.

Columns of data matrices are called variables, or, in general, attributes. Rows of a data matrices are typically subjects, but can be also other entities.

Variability and Information

The numbers in data matrices on the one hand must not be the same and on the other hand must not be random to carry information about phenomena that they describe. This variability of numbers in data matrices can be expressed as a quantity, called the variance. The goal of data analysis is to analyze the variance contained by the data and thus make the data meaningful.

A matrix of data from the real studies is usually much larger than the matrix shown above. This matrix must be entered it into a computer and analyzed. Events that take place after the data matrix is submitted to computer for data analysis are the subject of this book.

Marginal Referents of Data Matrices

Within social sciences, an element of a data matrix is often created when a subject encounters a statement: a test item, a question on a survey, or an inquiry about a personality characteristic. These queries are channeled through subject's cognitive systems that, in turn, direct psychomotor reactions of recording responses to statements. 

The row marginal referents of data matrices carry information about identities of subjects generating the responses. The column marginal referents contain information about the meaning of the statements. Thus, an element of a data matrix typically carries a quantified reaction of a subject to a statement.

Data matrices are bordered by row and column marginal referents. From the standpoint of formal data analysis, the exact character of these marginal referents is immaterial. Some authors prefer to call the marginal referents of data matrices attributes and entities to stress this point. By associating attributes and entities with numbers, we can generate a variety of designs. The possibilities are unlimited; statistical data analysis is not restricted by discipline, or by the character of measured entities or attributes.

There is a dictum that 'anything which exists, exists in some amount, and therefore can be measured.' Statistical data analysis attempts to translate measurements into indices that can be interpreted and structures that can be described. Following data analysis, a story can be told and  a paper can be written. The findings are closer to reality and more believable than if based solely on author’s personal experiences, introspections, or beliefs.

 

Prototypes of Bivariate Data Matrices

Continuous-Continuous Data Matrices

A bivariate data matrix with a continuous-continuous arrangement of data vectors

 

   

can be obtained, e.g., from asking a group of five subjects whether they like poetry (variable X), and whether they like Gothic novels (variable Y), using a five point rating scale. We might be interested in whether the answers to these two test items are related. The answer to this question can be provided by the coefficient of correlation.

Binary-Continuous Data Matrices

A bivariate data matrix with a binary-continuous arrangement of data vectors

 

 

can be obtained, e.g., when one is interested in whether there is a relationship between the gender of the subjects (coded as 0 or 1) and the enjoyment of poetry. The answer to this question can be provided by the point biserial coefficient of correlation.

Continuous-Binary Data Matrices

A bivariate data matrix with a continuous-binary arrangement of data vectors

 

is typical of discriminant analysis which is used when one is interested in whether some issue divides subjects into different groups.

Binary-Binary Data Matrices

A bivariate binary data matrix

 

 

could have been obtained, e.g., from two groups of subjects answering a question whether they liked poetry, or not, with alternatives provided in the 'yes - no' item response format. A correlation method of choice for answering problems formulated in this fashion is the phi coefficient of correlation.