|
|
|
In the preceding chapters, we often used as an example a variable
X equal to [1 2 3 4 5]. This example was used primarily for its brevity and
simplicity. In the course of the analysis of real data you are more likely to
encounter variables with repeating values, as, e.g., [1 2 2 2 3 3 3 3 3 4 4 4
5]. Frequencies of repeated values can be plotted as a histogram,
|
|
|
|
approximating a binomial distribution, discussed in detail in
appendix C. In turn, the binomial distribution is idealized by the normal
distribution.
Most distributions of test scores show remarkable similarities. If
plotted as a histogram, typically there are relatively few low and high scores;
most scores are clustered around the mean with score frequencies decreasing
toward both ends of the distribution. As the number of scores increases, the
histogram begins to look like a bell or the hump of a camel. The idealized
shape of this bell distribution was described by Gauss as the 'curve of errors', later called the 'normal distribution.'
Gauss served for many years as the director of Goettingen
astronomical observatory. Attracting his
attention was the report that the director of the Greenwich observatory had
fired his assistant for reporting observations of star transitions differently
from his own readings. Accurate readings of these transitions were critical for
the determination of sidereal time, the time based upon the axial and orbital
rotation of the earth with reference to the background of the stars. The exact
determination of the sideral time was, in Gauss' time, of crucial importance
for maritime navigation. An error of a few seconds would translate to an error
of several nautical miles when determining the longitude of a ship's position.
Gauss suspected that the different readings of sidereal transitions were caused
by individual differences in the reaction time of the observers. In this sense,
they are akin to distribution of test scores. Gauss conceptualized the normal
distribution as a curve of errors. The 'curve
of errors' was renamed 'normal curve'
by Karl Pearson.
Writing about the normal distribution Galton asserted that 'if the Greeks had known it, they would have
deified it. It reigns with serenity and in complete self-effacement amids the
wildest confusion. The more huge the mob and the greater the apparent anarchy,
the more perfect is its sway. It is the supreme Law of Unreason. Whenever a
large sample of chaotic elements are taken in hand and marshaled in the order
of their magnitude, an unsuspected and most beautiful form of regularity proves
to be latent all along.'
Analytical formulation of the Normal Distribution
In 1809, Gauss published analytical formula of the normal
distribution in his Theoria motus
corporum coelestium. For the standard scores z, the formula reads
In the above formula both pi and e are constants. Pi equals about
3.14 and e equals about 2.718. The above formula can be also written as
and plotted as

The normal distribution has its maximum height (ordinate) at z
equal to zero and is symmetrical about that ordinate. The curve changes from
convex to concave at z points on the abscissa equal to plus one and minus one.
In its mathematical idealization, this curve stretches from the negative
infinity to the positive infinity and covers a unit area. Integrating this area
allows us to associate every z score with the area from minus infinity up to
the specified z score, as shown in the table below.
|
Z |
Area |
|
-¥ |
.000 |
|
-3.00 |
.001 |
|
-2.33 |
.01 |
|
-2.00 |
.02 |
|
-1.64 |
.05 |
|
-1.28 |
.10 |
|
-1.00 |
.16 |
|
-.84 |
.20 |
|
-.52 |
.30 |
|
-.25 |
.40 |
|
.00 |
.50 |
|
.25 |
.60 |
|
.52 |
.70 |
|
.84 |
.80 |
|
1.00 |
.84 |
|
1.28 |
.90 |
|
1.64 |
.95 |
|
2.00 |
.98 |
|
2.33 |
.99 |
|
3.00 |
.999 |
|
+¥ |
1.00 |
Another view of the normal distribution is in terms of its central
areas. Almost 68% of the total area covered by the normal distribution are located
between the z-scores of plus and minus one.

Approximately 50% of the area under the standard normal
distribution are between the z-scores of plus and minus .67. Close to 95% of the
total area of the normal distribution is between z-scores of plus and minus
two. Some of these central areas are shown in the table below.
The normal distribution is described by the function
The area under the normal distribution, equal to unity, can be
integrated as
About 68% of the total area covered by the normal distribution are
located between the z-scores of plus and minus one. Approximately 50% of
the area under the standard normal distribution are between the z-scores of plus and minus .67. Close to 95% of the
total area of the normal distribution is between z-scores of plus and minus
two.