Cruise Scientific        Visual Statistics Studio       Table of Contents

Measures of Central Tendency

Measures of Central Tendency

The description of a variable usually begins with the specification of its single most representative value, often called the measure of location, or central tendency. There are several measures for this statistic; we will limit our discussion to the mean and the median.

Arithmetic Mean

Harmony in Music

The concept of the arithmetic mean is a very old one, formulated by a group of pre-Socratic philosophers, the Pythagoreans. The Pythagoreans were interested, among other things, in the numerical relationships governing the harmony in music. They originally described the arithmetic mean in a treatise On Music. This first description of the mean involved only two numbers. The mean was defined as a quantity that exceeds the smaller value by the same amount as the larger value exceeds the mean.

Computations

The arithmetic mean is a measure of central tendency commonly referred to as an average. The mean is the sum of scores, divided by the number of scores. Consider a variable X, indexing the scores of five subjects on a scale measuring their liking of poetry. The subjects responded to the question

I like poetry

 Responses of the subjects were recorded as

 

 The mean of variable X can be computed by using the formula

 

 where M denotes the mean of variable X, and n is the number of observations, cases, subjects, or attributes. The Greek capital letter sigma, , indicates that values of variable X should be summed. For the example, the mean equals 15 / 5 = 3.

Median

Some historians maintain that the median was introduced by Gauss in 1816. However, it was Fechner who, around 1878, called attention of the scientific community to this concept. The reason for symbolizing the median by the letter C is that Fechner called the median Centralwerth, the central value of an ordered series. 

Median vs. Mean

Fechner also described relationship between the mean and the median in asymmetric distributions. The median, signified by the capital letter C, is the midpoint of an ordered series. When the scores are not equally distributed along the whole range of a variable, the median is likely a more appropriate measure of the central tendency than the mean. For instance, consider an ordered distribution of scores [1 2 3 4 10].

To compute the median, count simultaneously from both sides of this series toward the middle. If the number of scores, n, is odd, as in this example, then the median is the value in the series where both counts meet. For our example, the median is 3.

When a distribution contains few extremely high or extremely low scores, the mean is biased by these outermost values and the median is a better choice, as shown in the figure below. 

 

An example might be a distribution of salaries within a corporation where few top managers get very high salaries. In this case, arithmetic mean is biased upwards and median better reflects the typical salary within the organization.

Computations

If the number of scores in the distribution is even, the median is the middle value extrapolated from the scores adjacent to the theoretical midpoint of the distribution. This extrapolation is frequently accomplished by averaging both adjacent scores, but other procedures, as, e.g., the geometric mean or graphic extrapolation of the observed trend may be used. Consider a data set [3 1 2 4].

To compute the median, first, order the distribution [1 2 3 4] and, next, average the two adjacent middle values (2 and 3). The median of this distribution equals 2.50.

Center of the Distribution

Measures of central tendency are fundamental statistical indices. While the median is used primarily within the confines of descriptive statistics, the mean is universally used within the general linear model and is an integral part of most statistical procedures. When a distribution is symmetric, the mean and the median coincide. If the distribution is not symmetric, as is often the case, the mean and median differ with respect to distances between the center of the distribution and its individual values.

Squared Distances from the Mean

In asymmetric distributions, if the center of the distribution is defined by the arithmetic mean, M, the squared distances between the center of the distribution and its individual values are as short as possible. In the example below, the sum of squared distances from the mean, 4, is 50.

However, the sum of squared distances from the median, 3, is 55.

Absolute Distances from the Median

If the center of an asymmetric distribution is defined by the median, C, the absolute distances between the center of the distribution and its individual values are as short as possible. In the example below, the sum of absolute distances from the median is 11.  

However, the sum of absolute distances from the mean is 12.

Summary

The universal acceptance of the arithmetic mean is due to its fundamental property that it is a measure of central tendency best in the least square sense, a criterion used by most methods of the general linear model. The mean minimizes the squared distances between the other values of the distribution and itself. The median minimizes the distances between the other values of the distribution and itself. If the distribution is symmetric, the mean and the median coincide and both the distances and squared distances from the center of the distribution are as small as possible. If the distribution is asymmetric, as a descriptive statistics, the median is a statistics superior to the arithmetic mean.