Cruise Scientific        Visual Statistics Studio       Table of Contents

Preface

The majority of ideas discussed here are for readers who want to know the why of statistics, not only the how. The goal is to present the general linear model of statistical analysis in a concise, but complete outline. The general linear model is a powerful tool to assist scientific discovery, to foster objective knowledge, and to aid critical thinking. It is mathematically elegant, well integrated, and, to some people, beautiful. The brevity of the present text is due to striving for clarity of the presentation. Much more could have been said, but not without sacrificing the visibility of the guiding principles and the sharpness of contours of the structure of the narrative.

Underlying the narrative are dominant themes of variance, conceptualized as an information measure, and description of data analysis as consisting of formal methods for extraction of meaning from data matrices. Throughout this text, the smallest possible examples are used to illustrate the statistical techniques that, in practice, would be applied to much larger data sets. Thus, the computational tedium that typically accompanies analysis of large data sets does not detract from learning the relevant principles involved. The importance of structural analysis and adequate interpretation of the information content is emphasized, as is the importance of integrating quantitative methodology with sound theoretical conceptualization. Special care has been taken to introduce relatively difficult material in a series of logically interlocking steps, complemented by concise description of computational algorithms the general linear model consists of.

Epistemological principles implicit to most of the statistical inquiry are discussed at several strategic points in the text. The point stressed is that statistics can be well conceptualized as a relatively new branch of epistemology. Accordingly, the statistical significance is not given the pivotal role it enjoys in many other texts on this subject. Instead, the accent is on the structural properties of statistical solutions and their visual representations.

The conceptual differences of the text from the orthodoxy are subtle, but substantial. The traditional concepts of statistics such as that the key question of statistical analysis is whether the data analyzed contain statistically significant differences and if they do, the analysis has reached its goal, are repeatedly challenged. The viewpoint stressed is that mere detection of nonrandom components or differences (at the .05 or the .01 level of statistical significance) is no longer a sole and sufficient goal of statistical analysis. The nonrandom components should be not only recognized but also extracted and enhanced. Their magnitudes should be ascertained and their structures should be described. Let me propose an analogy that might make the above suppositions more clear.

Remember the rather murky pictures of the surface of Venus in the early years of space explorations? Compare those with the computer-enhanced pictures of the Venus' surface obtained later. Transmission of information from Venus contains not only a ‘statistically significant' amount of information, but also noise. One might follow the approach of the Fisher's school and propose a null hypothesis that there are no stones on the surface of Venus, reject it at the .0001 level of significance, and still see very little surface detail. It was not testing of null hypotheses, but enhancement of information components by filtering out noise that lead to crystal clear transmission of images from that planet, shrouded in clouds.

Another characteristics of this book is the avoidance of the 'sums of squares', 'mean square' terminology and associated computational techniques, typical of Fisher's conceptualization of statistics. Textbooks adapting Fisher's approach from the very beginning, including the use of degrees of freedom in lieu of n even within contexts that do not require deployment of this construct, extract heavy penalties from students by forcing them to embrace a convoluted notational system and obtuse concepts and algorithms. As commented by William Press (at the Harvard-Smithsonian Center for Astrophysics), Saul Teukolsky (at the Department of Physics, Cornell University), William Vetterling (at the Polaroid Corporation), and Brian Flannery (at the EXXON Research and Engineering Company), that 'if the difference between n and n-1 ever matters to you, then you are probably up to no good anyway - e.g., trying to substantiate a questionable hypothesis with marginal data.'

This textbook departs significantly from standard introductory statistics texts on several other points, stressing the graphical rendering of data structures that make statistics 'visible' and intuitively plausible. The present text is not only a textbook, but also a polemic with the Fisher's tradition in statistics, an attempt to reaffirm the Pearson's legacy, and a programmatic statement of some key aspects of the modern, computer assisted statistical theory and practice. However, the main motivation behind this re-conceptualization of statistics is to make statistics more open to attempts to humanize it.