Properties of Normal Distribution

In the preceding chapters, we often used as an example a variable X equal to [1 2 3 4 5]. This example was used primarily for its brevity and simplicity. In the course of the analysis of real data you are more likely to encounter variables with repeating values, as, e.g., [1 2 2 2 3 3 3 3 3 4 4 4 5]. Frequencies of repeated values can be plotted as a histogram,

 

 

 

 

 

approximating a binomial distribution, discussed in detail in appendix C. In turn, the binomial distribution is idealized by the normal distribution.

Most distributions of test scores show remarkable similarities. If plotted as a histogram, typically there are relatively few low and high scores; most scores are clustered around the mean with score frequencies decreasing toward both ends of the distribution. As the number of scores increases, the histogram begins to look like a bell or the hump of a camel. The idealized shape of this bell distribution was described by Gauss as the 'curve of errors', later called the 'normal distribution.'

Gauss served for many years as the director of Goettingen astronomical observatory. Attracting his attention was the report that the director of the Greenwich observatory had fired his assistant for reporting observations of star transitions differently from his own readings. Accurate readings of these transitions were critical for the determination of sidereal time, the time based upon the axial and orbital rotation of the earth with reference to the background of the stars. The exact determination of the sideral time was, in Gauss' time, of crucial importance for maritime navigation. An error of a few seconds would translate to an error of several nautical miles when determining the longitude of a ship's position. Gauss suspected that the different readings of sidereal transitions were caused by individual differences in the reaction time of the observers. In this sense, they are akin to distribution of test scores. Gauss conceptualized the normal distribution as a curve of errors. The 'curve of errors' was renamed 'normal curve' by Karl Pearson.

Writing about the normal distribution Galton asserted that 'if the Greeks had known it, they would have deified it. It reigns with serenity and in complete self-effacement amids the wildest confusion. The more huge the mob and the greater the apparent anarchy, the more perfect is its sway. It is the supreme Law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to be latent all along.'

Analytical formulation of the Normal Distribution

In 1809, Gauss published analytical formula of the normal distribution in his Theoria motus corporum coelestium. For the standard scores z, the formula reads

 

 

 

In the above formula both pi and e are constants. Pi equals about 3.14 and e equals about 2.718. The above formula can be also written as

 

 

 

and plotted as

 

The normal distribution has its maximum height (ordinate) at z equal to zero and is symmetrical about that ordinate. The curve changes from convex to concave at z points on the abscissa equal to plus one and minus one. In its mathematical idealization, this curve stretches from the negative infinity to the positive infinity and covers a unit area. Integrating this area allows us to associate every z score with the area from minus infinity up to the specified z score, as shown in the table below.

 

 

Z

Area

-¥

.000

-3.00

.001

-2.33

.01

-2.00

.02

-1.64

.05

-1.28

.10

-1.00

.16

-.84

.20

-.52

.30

-.25

.40

.00

.50

.25

.60

.52

.70

.84

.80

1.00

.84

1.28

.90

1.64

.95

2.00

.98

2.33

.99

3.00

.999

1.00

 

Another view of the normal distribution is in terms of its central areas. Almost 68% of the total area covered by the normal distribution are located between the z-scores of plus and minus one.

Approximately 50% of the area under the standard normal distribution are between the z-scores of plus and minus .67. Close to 95% of the total area of the normal distribution is between z-scores of plus and minus two. Some of these central areas are shown in the table below.

 

 

Summary

The normal distribution is described by the function

 

 

 

The area under the normal distribution, equal to unity, can be integrated as

 

 

 

About 68% of the total area covered by the normal distribution are located between the z-scores of plus and minus one. Approximately 50% of the area under the standard normal distribution are between the z-scores of plus and minus .67. Close to 95% of the total area of the normal distribution is between z-scores of plus and minus two.