Visual Statistics Studio             Table of Contents

 

Measures of Central Tendency

The description of a variable usually begins with the specification of its single most representative value, often called the measure of location, or central tendency. There are several measures for this statistic; we will limit our discussion to the mean and the median.

        The concept of the arithmetic mean is a very old one, formulated by a group of pre-Socratic philosophers, the Pythagoreans. The Pythagoreans were interested, among other things, in the numerical relationships governing the harmony in music. They originally described the arithmetic mean in a treatise On Music. This first description of the mean involved only two numbers. The mean was defined as a quantity that exceeds the smaller value by the same amount as the larger value exceeds the mean.

     Some historians maintain that the median was introduced by Gauss in 1816. However, it was Fechner who, around 1878, called attention of the scientific community to this concept. The reason for symbolizing the median by the letter C is that Fechner called the median Centralwerth, the central value of an ordered series. Fechner also described relationship between the mean and the median in asymmetric distributions.

Arithmetic Mean

The arithmetic mean is a measure of central tendency commonly referred to as an average. The mean is the sum of scores, divided by their number. Consider a variable X, indexing the scores of five subjects on a scale measuring their liking of poetry. The subjects responded to the question

 

I like poetry

 

 

 

Responses of the subjects, answering the question 'I like poetry' by using a five step rating scale are recorded as

 

 

 

 

The mean of variable X can be computed by using the formula

 

 

 

 

where M denotes the mean of variable X, and n is the number of subjects. The Greek capital letter sigma indicates that values of variable X should be summed. For the example, the mean equals 15 / 5 = 3.

Median

The median, signified by the capital letter C, is defined as that point below which fifty percent of the cases fall. In other words, the median represents the midpoint of an ordered series. When the scores are not equally distributed along the whole range of a variable, the median is likely a more appropriate measure of the central tendency than the mean. Consider the ordered distribution of scores [1 2 3 4 10]. To compute the median, count simultaneously from both sides of this series toward the middle. If the number of scores, n, is odd, as in this example, then the median is the value in the series where both counts meet. For our example, the median is 3.

When a distribution contains few extremely high or extremely low scores, the mean is biased by these outermost values and the median is a better statistics, as shown in the figure below.

 

 

 

 

An example might be a distribution of salaries within a corporation where few top managers get very high salaries. In this case, arithmetic mean is biased upwards and median better reflects the typical salary within the organization.

If the number of scores in the distribution is even, the median is the middle value extrapolated from the scores adjacent to the theoretical midpoint of the distribution. This extrapolation is frequently accomplished by averaging both adjacent scores, but other procedures, as, e.g., the geometric mean or graphic extrapolation of the observed trend may be used. Consider a data set [3 1 2 4]. To compute the median, first, order the distribution [1 2 3 4] and, next, average the two adjacent middle values (2 and 3). The median of this distribution equals 2.50.

Distances from the Mean 

Measures of central tendency are fundamental statistical indices. While the median is used primarily within the confines of descriptive statistics, the mean is universally used within the general linear model and is an integral part of most statistical procedures. When a distribution is symmetric, the mean and the median coincide. If the distribution is not symmetric, as is often the case, the mean and median differ with respect to distances between the center of the distribution and its individual values.

In asymmetric distributions, if the center of the distribution is defined by the arithmetic mean, M, the squared distances between the center of the distribution and its individual values as short as possible. In the example below, the sum of squared distances from the mean, 4, is 50. The sum of absolute distances from the center of the distribution is 12.

 

 

 

Distances from the Median 

If the center of an asymmetric distribution is defined by the median, C, the distances between the center of the distribution and its individual values are as short as possible. In the example below, the sum of squared distances from the median, 3 is 55 while the sum of absolute distances from the center of the distribution is 11.

 

 

Summary

The universal acceptance of the arithmetic mean is due to its fundamental property that it is a measure of central tendency best in the least square sense, a criterion used by most methods of the general linear model. The mean minimizes the squared distances between the other values of the distribution and itself. The median minimizes the distances between the other values of the distribution and itself. If the distribution is symmetric, the mean and the median coincide and both the distances and squared distances from the center of the distribution are as small as possible. If the distribution is asymmetric, as a descriptive statistics, the median is a statistics superior to the arithmetic mean.