Normalization

The topics discussed in this chapter address the fact that virtually no empirical distribution is normal. Most empirical distributions only approximate normality. Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution and McCall in 1922 proposed a method for conversion of empirical distributions approximating normality into distributions close to being normal.

Deviations of empirical distributions from the normal distribution can be described in several ways. Comparing values of the mean and the median is one of them. If the mean and median do not coincide, the distribution is skewed. In the skewed distribution, the mean is pulled toward the skewed side more than the median. The greater is the difference between the mean and the median, the greater is the skew. The question that naturally arises in this context is 'how much' the mean and median must differ for a distribution to be markedly skewed. The moments about the mean provide quantitative indices to this and related questions.

The term moment is used in mechanics as a measure of the force of rotation. The strength of this force depends upon the distance from the point of rotation. Conceptually, moments about the mean are best understood by considering a distribution of deviation scores as, e.g., the distribution of deviation scores of variable X [1 2 3 4 10] shown below.

 

 

 

The deviation scores are loci of forces with strength determined by their distances from the arithmetic mean. The arithmetic mean is the fulcrum, the center of gravity, located at the point where the forces balance.

The First Moment as a Measure of Central Tendency

Computationally, the moments about the mean are best described by using standard scores. For example, standard scores for a variable X [1 2 3 6 7] can be computed as

 

 

       Describing the moments about the mean, Karl Pearson used the Greek letter  subscripted as 1, 2, 3, and 4 to signify the first four moments about the mean. Central tendency measured by the arithmetic mean is called the first moment within this context and is signified by the Greek letter mu subscripted by 1, as

 

 

The first moment above the mean is always equal to zero, as the standard scores are defined as scores with a mean of zero and standard deviation equal to one.

The Second Moment as a Measure of Variability

The second moment represents the variance of the distribution. To compute the second moment about the mean, the standard scores have to be squared, as

 

 

The variance of a standard normal distribution is called the second moment, and is signified by the Greek letter mu subscripted by 2, as

 

 

The second moment about the mean is always equal to one.

The Third Moment as a Measure of Skewness

The third moment about the mean is called skewness. Skewness refers to departures of a distribution from symmetry. In a negatively skewed distribution the tail of a distribution points toward the low scores. Distributions with a tail pointing toward high values of a variable are positively skewed. Skewness is computed as a third moment about the mean

 

 

If the standard normal distribution is symmetrical, the third moment equals zero. For nonsymmetrical distributions, it can be either positive or negative. If the third moment is positive, the distribution is positively skewed; if it is negative, the distribution is negatively skewed. Consider the following example.

 

 

For this example, the third moment is positive (1.22/5 = .24), indicating that our distribution is skewed toward the right, i.e., the scores in the distribution tend more toward the lower end of the scale.

Most statistics programs analyzing results of achievement tests report the skewness of the test scores. If the coefficient of skewness is negative, the test is said to have a 'low ceiling', that is, it contains too few difficult items. Distributions with positive coefficient of skewness are usually generated by tests with 'high floor' containing too few easy items.

       Negative skewness is sometimes desirable, especially in classes that are taught with the goal of achieving mastery for certain criteria by students. Changing the test items may alter the skewness. Thus, positive skewness can be reduced by clarifying the phrasing of test questions, adding easier items, or by removing misleading distracters. Adding more relevant items or functional distracters can rectify negative skewness.

The Fourth Moment as a Measure of Kurtosis

Karl Pearson coined the term kurtosis in 1906. In a classic article in Biometrika he wrote: 'Given two frequency distributions which have the same variability as measured by the standard deviation, they may be relatively more or less flat-topped than the normal curve. If more flat-topped, I term them platykurtic, if less flat-topped, leptokurtic, and if equally flat-topped, mesokurtic.'  More descriptively, platykurtic curves tend to be elongated and flat, leptokurtic appear taller and narrow, and mesokurtic curves tend to be bell-shaped like the normal curve. Let us consider the previous example again, this time computing all four moments. The kurtosis, reflecting the extent to which the density of the empirical distribution differs from the probability densities of the normal curve, is computed as the fourth moment about the mean

 

 

The value of the fourth moment about the mean depends in part on the shape of the scrutinized distribution. If the distribution is flat (platykurtic), the value of the fourth moment about the mean is smaller than zero; if the distribution is peaked (leptokurtic), its value is greater than zero, and as its value approaches zero, the distribution's shape begins to approximate a normal distribution, which is mesokurtic. The 3 in the above formula is subtracted in order to make the boundaries between platykurtic, mesokurtic, and leptokurtic categories zero, instead of three.

 

 

For the current example, the value for the fourth moment about the mean was computed as 1.40  3, which equals -1.60. The value of the fourth moment is less than zero; the distribution is platykurtic.

Area Transformations

Skewness and kurtosis describe departures from normality in the distributions of variables. There are several transformations for changing the distributions of variables into distributions closer to normal distribution. These transformations vary with respect they are able to accomplish this goal. One of the most efficient transformations in this respect is the area transformation.

       McCall proposed area transformations of test scores in 1922. The resulting test scores are frequently called area transformed T scores, although other standardized scores, such as IQ scores or Stens can be area transformed. In a personal letter to the author, William McCall described the development of the T-scores as follows: 'while I was a student of Thorndike, I was led to believe that every trait measured has to be normal. I got suspicious and asked my brother who never heard of the normal curve to make a mark on the ground for the intelligence of every man for miles down the road. The result -- a normal curve!

       A few days later to entertain my six year old niece, we cut down a good-sized bush and measured the length of the leaves on it. A normal curve, of course. Not at all!  A strong tri-modal curve. Being too early to repeat the experiment, I have tried to get some student interested enough to repeat the study to no avail. I'll probably go grieving to the grave without learning whether trimodality is characteristic of leaves on bushes! As you can see, the T score is not wholly free from difficulties. Moreover, the T unit proved to be too technical for U.S. students of education and even more so to Chinese teachers during my days in China, so I found it necessary to invent a simpler unit, called the G score...'

       Let us demonstrate area transformation of scores on variable X [1 2 3 6 7] shown as black squares. Each score (1.0) is split into two components (.50 +.50) shown as black diamonds. The theoretical reason for this split is that each score of the variable X is only a point estimate of the each score's interval. The first interval stretches from minus infinity to 1.5, the second interval is located between 1.5 and 2.5, the third interval is 2.5 - 3.5, the fourth 3.5 - 5.5, the fourth 5.5 - 6.5, and the fifth interval stretches from 6.5 to plus infinity. 

 

 

 

The numerical algorithm for the area transformation consists of several steps: In the first numerical line of the diagram below, the scores were split into .50 - .50 parts. In the second numerical line the split scores were reassembled within each score's interval. In the third numerical line, the scores within each interval were cumulated. In the fourth numerical line, the cumulated scores were converted into proportions by dividing the cumulative frequencies by the total n of cases, for the example, by 5.0.

 

 

 

The resulting proportions in the fourth numerical line were then converted to their corresponding z-scores, shown at the bottom of the above diagram. For this conversion, we have to use the tables of the z scores.

Note that proportions equal to 1.00 have to be deleted, since they correspond to infinitely large values. As a last step, the standard scores were transformed into T scores, this time using the linear transformations, as shown in the following table.

 

 

 

 

In the following diagram, top distribution is the initial distribution of the obtained scores; the bottom distribution is the distribution of area transformed T scores. Notice that the normalizing the distribution made the scores of the new distribution evenly spaced. The skewness of the original variable X, as computed in the previous sections was .24. The skewness of the area-transformed distributions is zero. Area transformations also alter kurtosis, but do not change platykurtic or leptokurtic distributions into mesokurtic distributions.

 

 

 

Area transformations are the better method for the normalization of data than other methods used to accomplish the same goal, as, e.g., the square root method, or the often-used arc sine transformations.

Summary

The moments of the standard normal distribution, i.e., the formulae for its mean, variance, skewness, and kurtosis, are summarized as

 

 

 

The mean of standard scores, always equals zero; thus, the first moment will always be equal to zero. The variance of standard scores is always equal to one; thus, the second moment will always be equal to one. When the third moment is a positive number, the distribution is positively skewed. As the third moment gets closer to zero, the distribution becomes more symmetrical. If the third moment is a negative number, the distribution is negatively skewed. When the fourth moment is less than zero, the distribution is platykurtic. If it equals about zero, the distribution is mesokurtic. If the fourth moment is greater than zero, the distribution is leptokurtic.