Cruise Scientific        Visual Statistics Studio       Table of Contents

Normalization

The topics discussed in this chapter address the fact that virtually no empirical distribution is normal. Most empirical distributions only approximate normality. Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution and McCall in 1922 proposed a method for conversion of empirical distributions approximating normality into distributions close to being normal.

Moments About the Mean

Deviations of empirical distributions from the normal distribution can be described in several ways. Comparing values of the mean and the median is one of them. If the mean and median do not coincide, the distribution is skewed. In the skewed distribution, the mean is pulled toward the skewed side more than the median. The greater is the difference between the mean and the median, the greater is the skew. The question that naturally arises in this context is 'how much' the mean and median must differ for a distribution to be markedly skewed. The moments about the mean provide quantitative indices to this and related questions.

The term moment is used in mechanics as a measure of the force of rotation. The strength of this force depends upon the distance from the point of rotation. Conceptually, moments about the mean are best understood by considering a distribution of deviation scores as, e.g., the distribution of deviation scores of variable X [1 2 3 4 10] shown below. 

The deviation scores are loci of forces with strength determined by their distances from the arithmetic mean. The arithmetic mean is the fulcrum, the center of gravity, located at the point where the forces balance.

The First Moment as a Measure of Central Tendency

Computationally, the moments about the mean are best described by using standard scores. For example, standard scores for a variable X [1 2 3 6 7] can be computed as 

Describing the moments about the mean, Karl Pearson used the Greek letter, , subscripted as 1, 2, 3, and 4 to signify the first four moments about the mean. The mean of the distribution of the above set of scores is called the first moment and is signified by the Greek letter subscripted by 1, as 

 The first moment above the mean is always equal to zero, as the standard scores are defined as scores with a mean of zero and standard deviation equal to one.

The Second Moment as a Measure of Variability

The second moment represents the variance of the distribution. To compute the second moment about the mean, the standard scores have to be squared, as

 The variance of a standard normal distribution is called the second moment, and is signified by the Greek letter mu subscripted by 2, as

 

The second moment about the mean is always equal to one.

The Third Moment as a Measure of Skewness

The third moment about the mean is called skewness. Skewness refers to departures of a distribution from symmetry. In a negatively skewed distribution the tail of a distribution points toward the low scores.

Distributions with a tail pointing toward high values of a variable are positively skewed.

Skewness is computed as a third moment about the mean

   

If the standard normal distribution is symmetrical, the third moment equals zero. For nonsymmetrical distributions, it can be either positive or negative. If the third moment is positive, the distribution is positively skewed; if it is negative, the distribution is negatively skewed. Consider the following example. 

For this example, the third moment is positive (1.22/5 = .24), indicating that our distribution is skewed toward the right. The long tail points toward the high end of the distribution (e.g., 6 and 7). In other words, most values bump up at the low end of the scale (e.g., 1, 2, and 3).

Do Test Scores Have Ceiling and Floors?

Most statistics programs analyzing results of achievement tests report the skewness of the test scores. If the skewness is negative (most students score higher on the test), the test is said to have a 'low ceiling', that is, it contains too few difficult items.

Adding more relevant items or functional distractors can rectify negative skewness. However, negative skewness is sometimes desirable, especially in classes that are taught with the goal of achieving mastery for certain criteria by students.

Distributions with positive coefficient of skewness (most students score lower on the test) are usually generated by tests with 'high floor' containing too many difficult items.

Changing the test items may alter the skewness. Thus, positive skewness can be reduced by clarifying the phrasing of test questions, adding easier items, or by removing misleading distractors. 

 The Fourth Moment as a Measure of Kurtosis

Karl Pearson coined the term kurtosis in 1906. In a classic article in Biometrika he wrote: 'Given two frequency distributions which have the same variability as measured by the standard deviation, they may be relatively more or less flat-topped than the normal curve. If more flat-topped, I term them platykurtic, if less flat-topped, leptokurtic, and if equally flat-topped, mesokurtic.'  More descriptively, platykurtic curves tend to be elongated and flat, leptokurtic appear taller and narrow, and mesokurtic curves tend to be bell-shaped like the normal curve. A platykurtic curve (blue) and a leptokurtic curve (red) are graphed below 

Let us consider the previous example again, this time computing all four moments. The kurtosis, reflecting the extent to which the density of the empirical distribution differs from the probability densities of the normal curve, is computed as the fourth moment about the mean

 

The value of the fourth moment about the mean depends in part on the shape of the scrutinized distribution. If the distribution is flat (platykurtic), the value of the fourth moment about the mean is smaller than zero; if the distribution is peaked (leptokurtic), its value is greater than zero, and as its value approaches zero, the distribution's shape begins to approximate a normal distribution, which is mesokurtic. The 3 in the above formula is subtracted in order to make the boundaries between platykurtic, mesokurtic, and leptokurtic categories zero, instead of three.       

 For the current example, the value for the fourth moment about the mean was computed as 1.40 – 3, which equals -1.60. The value of the fourth moment is less than zero; the distribution is platykurtic. 

Area Transformations

Skewness and kurtosis describe departures from normality in the distributions of variables. There are several transformations for changing the distributions of variables into distributions closer to normal distribution. These transformations vary with respect they are able to accomplish this goal. One of the most efficient transformations in this respect is the area transformation.

McCall and Area Transformations

McCall proposed area transformations of test scores in 1922. The resulting test scores are frequently called area transformed T scores, although other standardized scores, such as IQ scores or Stens can be area transformed. In a personal letter to the author, William McCall described the development of the T-scores as follows: 'while I was a student of Thorndike, I was led to believe that every trait measured has to be normal. I got suspicious and asked my brother who never heard of the normal curve to make a mark on the ground for the intelligence of every man for miles down the road. The result -- a normal curve!

A few days later to entertain my six year old niece, we cut down a good-sized bush and measured the length of the leaves on it. A normal curve, of course. Not at all!  A strong trimodal curve. Being too early to repeat the experiment, I have tried to get some student interested enough to repeat the study to no avail. I'll probably go grieving to the grave without learning whether trimodality is characteristic of leaves on bushes! As you can see, the T score is not wholly free from difficulties. Moreover, the T unit proved to be too technical for U.S. students of education and even more so to Chinese teachers during my days in China, so I found it necessary to invent a simpler unit, called the G score...'

Split Point Scores into Adjacent Intervals

Let us demonstrate area transformation of scores on variable X [1 2 3 6 7] shown as black squares. Each score (1.0) is split into two components (.50 +.50) shown as black diamonds. The theoretical reason for this split is that each score of the variable X is only a point estimate of the each score's interval. The first interval stretches from minus infinity to 1.5, the second interval is located between 1.5 and 2.5, the third interval is 2.5 - 3.5, the fourth 3.5 - 5.5, the fourth 5.5 - 6.5, and the fifth interval stretches from 6.5 to plus infinity. 

 The numerical algorithm for the area transformation consists of several steps: In the first numerical line of the diagram below, the scores were split into .50 - .50 parts. In the second numerical line the split scores were reassembled within each score's interval.

Cumulate the Scores and Convert them into Proportions

In the third numerical line, the scores within each interval were cumulated. In the fourth numerical line, the cumulated scores were converted into proportions by dividing the cumulative frequencies by the total n of cases, for the example, by 5.0.

Convert Proportions  to Corresponding z Scores

The resulting proportions in the fourth numerical line were then converted to their corresponding z-scores, shown at the bottom of the diagram. Note that proportions equal to 1.00 have to be deleted, since they correspond to infinitely large values.

For this conversion, we have to use the table of the z scores and their corresponding areas. 

Area

Z

.000

-

.001

-3.00

.01

-2.33

.02

-2.00

.05

-1.65

.10

-1.28

.16

-1.00

.20

-.84

.30

-.52

.40

-.25

.50

.00

.60

.25

.70

.52

.80

.84

.84

1.00

.90

1.28

.95

1.65

.98

2.00

.99

2.33

.999

3.00

1.00

+

Area Transformation vs. Linear Transformation

Area transformed standard scores were read from a statistical table, associating standard scores with their corresponding areas under the normal distribution.

However, the linearly transformed standard scores were obtained from deviation scores by dividing by the standard deviation.

 

Also, note that the skewness of the original variable X (or linear transformed standard scores Zx) is .24. The skewness of the area transformed distributions is changed to zero. The area transformation also alters kurtosis, but does not transform platykurtic or leptokurtic distributions into mesokurtic distributions.

Area Transformed T Scores 

As a last step, the standard scores were transformed into T scores, this time using the linear transformations, as shown in the following table.

T = 10 * areaZ + 50

Initial Distribution and Area Transformed T Scores

In the following diagram, top distribution is the initial distribution of the obtained scores; the bottom distribution is the distribution of area transformed T scores. Notice that the normalizing the distribution made the scores of the new distribution evenly spaced. The skewness of the original variable X, as computed in the previous sections was .24. The skewness of the area-transformed distributions is zero. Area transformations also alter kurtosis, but do not change platykurtic or leptokurtic distributions into mesokurtic distributions.

 

  Issues

The advantage of using the McCall area transformations is substantial. Prior to the area transformation, however, the skewness of the scores to be transformed should be tested for statistical significance. The concept of the statistical significance is to be discussed latter, but suffice to note that for skewness to be significant, its value must be greater than .53 for distributions with N about 50, greater than .39 for N of about 100 and greater than .23 for distributions with N about 300. Exact critical values for the coefficients of skewness and kurtosis can be found in Egon S. Pearson's tables, published in Biometrika, 1930, 22, 239-249.

If skewness is not statistically significant, the departure of the distribution from normality can be considered due to random factors, and the distribution can be normalized.  In general, area transformations are a better method of the normalization of data than other methods, as, e.g., the square root method, or the often used arc sine transformation. Area transformations can potentially save substantial effort that is often associated with rewriting a test instrument.

If the coefficient of skewness is statistically significant, other avenues leading to normality might be explored, such as rewriting test items to compensate either for 'low ceiling' or 'high floor' effects. Normalizing markedly skewed distributions may obscure factors making the distribution skewed to begin with. These factors may, in some cases, be of crucial importance. 

Summary

The moments of the standard normal distribution, i.e., the formulae for its mean, variance, skewness, and kurtosis, are summarized as

 

The mean of standard scores, always equals zero; thus, the first moment will always be equal to zero. The variance of standard scores is always equal to one; thus, the second moment will always be equal to one. When the third moment is a positive number, the distribution is positively skewed. As the third moment gets closer to zero, the distribution becomes more symmetrical. If the third moment is a negative number, the distribution is negatively skewed. When the fourth moment is less than zero, the distribution is platykurtic. If it equals about zero, the distribution is mesokurtic. If the fourth moment is greater than zero, the distribution is leptokurtic.