|
Cruise Scientific Visual Statistics Studio Table of Contents |
Preface
The majority of ideas discussed here are for readers who want to know the why of
statistics, not only the how. The goal is to present the general linear model of
statistical analysis in a concise, but complete outline. The general linear
model is a powerful tool to assist scientific discovery, to foster objective
knowledge, and to aid critical thinking. It is mathematically elegant, well
integrated, and, to some people, beautiful. The brevity of the present text is
due to striving for clarity of the presentation. Much more could have been said,
but not without sacrificing the visibility of the guiding principles and the
sharpness of contours of the structure of the narrative.
Underlying the narrative are dominant themes of variance, conceptualized as an
information measure, and description of data analysis as consisting of formal
methods for extraction of meaning from data matrices. Throughout this text, the
smallest possible examples are used to illustrate the statistical techniques
that, in practice, would be applied to much larger data sets. Thus, the
computational tedium that typically accompanies analysis of large data sets does
not detract from learning the relevant principles involved. The importance of
structural analysis and adequate interpretation of the information content is
emphasized, as is the importance of integrating quantitative methodology with
sound theoretical conceptualization. Special care has been taken to introduce
relatively difficult material in a series of logically interlocking steps,
complemented by concise description of computational algorithms the general
linear model consists of.
Epistemological principles implicit to most of the statistical inquiry are
discussed at several strategic points in the text. The point stressed is that
statistics can be well conceptualized as a relatively new branch of
epistemology. Accordingly, the statistical significance is not given the pivotal
role it enjoys in many other texts on this subject. Instead, the accent is on
the structural properties of statistical solutions and their visual
representations.
The conceptual differences of the text from the orthodoxy are subtle, but
substantial. The traditional concepts of statistics such as that the key
question of statistical analysis is whether the data analyzed contain
statistically significant differences and if they do, the analysis has reached
its goal, are repeatedly challenged. The viewpoint stressed is that mere
detection of nonrandom components or differences (at the .05 or the .01 level of
statistical significance) is no longer a sole and sufficient goal of statistical
analysis. The nonrandom components should be not only recognized but also
extracted and enhanced. Their magnitudes should be ascertained and their
structures should be described. Let me propose an analogy that might make the
above suppositions more clear.
Remember the rather murky pictures of the surface of Venus in the early years of
space explorations? Compare those with the computer-enhanced pictures of the
Venus' surface obtained later. Transmission of information from Venus contains
not only a ‘statistically significant' amount of information, but also noise.
One might follow the approach of the Fisher's school and propose a null
hypothesis that there are no stones on the surface of Venus, reject it at the
.0001 level of significance, and still see very little surface detail. It was
not testing of null hypotheses, but enhancement of information components by
filtering out noise that lead to crystal clear transmission of images from that
planet, shrouded in clouds.
Another characteristics of this book is the avoidance of the 'sums of squares',
'mean square' terminology and associated computational techniques, typical of
Fisher's conceptualization of statistics. Textbooks adapting Fisher's approach
from the very beginning, including the use of degrees of freedom in lieu of n
even within contexts that do not require deployment of this construct, extract
heavy penalties from students by forcing them to embrace a convoluted notational
system and obtuse concepts and algorithms. As commented by William Press (at the
Harvard-Smithsonian Center for Astrophysics), Saul Teukolsky (at the Department
of Physics, Cornell University), William Vetterling (at the Polaroid
Corporation), and Brian Flannery (at the EXXON Research and Engineering
Company), that 'if the difference between n and n-1 ever matters to you, then
you are probably up to no good anyway - e.g., trying to substantiate a
questionable hypothesis with marginal data.'
This textbook departs significantly from standard introductory statistics texts
on several other points, stressing the graphical rendering of data structures
that make statistics 'visible' and intuitively plausible. The present text is
not only a textbook, but also a polemic with the Fisher's tradition in
statistics, an attempt to reaffirm the Pearson's legacy, and a programmatic
statement of some key aspects of the modern, computer assisted statistical
theory and practice. However, the main motivation behind this
re-conceptualization of statistics is to make statistics more open to attempts
to humanize it.