|
Cruise Scientific Visual Statistics Studio Table of Contents |
The concept of correlation is based on Galton and Pearson's notion that there is a category beyond causation of which causation is only a limit. The world of Galton and Pearson was the Newtonian world: orderly, categorizing, the world of Pope's poems and exciting new scientific discoveries occurring in the physical sciences. Galton and Pearson hoped to bring the quantitative rigor of the physical sciences to the social sciences that were at that time dominated by qualitative descriptions and philosophical speculations.
The method of correlation opened new vistas for quantitative social science. The notion of causality as the sole explanatory principle of events was broadened to include the notion of association between events. Expectation was born that these quantified associations would be elaborated into homological networks, encompassing relationships between elements of complex systems. As the association between events becomes stronger, the probability that those events are also influenced by unknown factors lessens.
Hopes for a new, quantitative social science were tempered by observations that rigorous demands necessary to assure the correctness of the assumptions asserted the relevance of factors comprising the correlational studies, and the plausibility of the experimental framework. Failures to meet these conditions frequently inspired critics of correlational methods to conjure examples of patently erroneous conclusions based upon correlation between superficially related events.
The methodological issues associated with correlational analyses are quite complex and hard to understand without an intimate knowledge of the technique itself. The formula for computing a Pearson's product-moment coefficient of correlation preserves the form of the coefficient of covariance,
![]()
and substitutes deviation scores for standard scores
![]()
This definitional formula of the coefficient of correlation, together with the formulae for the mean and the variance, comprise the most important formulae of the general linear model described thus far. It is a definitional formula in the sense that it cannot be readily derived from some other, more basic expression. Its form suggests the full name of this statistics: the product-moment coefficient of correlation. All other renderings of the coefficient of correlation can be algebraically derived from this basic form.
While the coefficient of covariance has no upper and lower limits, the coefficient of correlation can vary from positive one (indicating a perfect positive relationship), through zero (indicating the absence of a relationship), to negative one (indicating a perfect negative relationship). To gain insight into the concept of the coefficient of correlation and its properties, let us consider an example of two rating scales, listed below.
I would like to be a librarian
I like poetry
The responses were recorded as

together with its scatterplot
The question to be answered is whether the answers to these two questions are related. Computation of the product-moment coefficient of correlation is outlined as

As displayed in the tabular presentation of the computational example, the obtained scores X and Y are translated to deviation scores x and y by subtracting their respective means (3; 3). The variances are then computed by squaring their deviation scores and computing their means (2; 2). Taking the square roots of both variances, their standard deviations (1.41; 1.41) are obtained.
Dividing the deviation scores by their corresponding standard deviations results in standard scores zx and zy.
By forming the product of the standard scores, summing them (2.50) and computing the mean of this product, the coefficient of correlation (.50) is obtained.
The correlation coefficient is positive. Higher scores on the variable X are associated with higher scores on the variable Y. Lower scores on the variable X are associated with lower scores on the variable Y.
The coefficient of correlation remains invariant with respect to change of the measurement unit. The definitional formula for the Pearson's product-moment coefficient of correlation can be translated into a formula of the coefficient of correlation for the deviation scores by substituting
![]()
and
![]()
These substitution results in a formula for the coefficient
of correlation expressed in deviation scores

The necessary steps for computing the coefficient of correlation, using deviation scores, are summarized as

The computational procedures, contained in the shorthand form by the formula for computing coefficient of correlation using deviation scores, and summarized in the above table, can be verbally explained as follows.
As a first step, the means of the X and Y variables are computed, and the obtained scores are transformed to deviation scores. The mean of the product of the deviation scores is computed as 1.00. Division by the product of both standard deviations gives the value of the coefficient of correlation as .50. This value is identical to the value obtained from the standard score formula.
Changing the deviation into obtained scores within the formula for the coefficient of correlation in deviation scores, i.e., substituting X - Mx and Y - My for the deviation scores x and y, as
![]()
the expression
![]()
is obtained. This expression can be simplified as

Under the common denominator, the formula for the coefficient of correlation in obtained scores can be written as
or, alternatively,

and
.
For
the example, the variance of X equals 2 (55/5 - 32
= 2). The variance of Y also equals 2. Taking the
square root, the standard deviation of both variables is
equal to 1.41. The coefficient of correlation is then
computed as (10 - (3)(3))/(1.41)(1.41) which equals .50.
Let us consider jointly the deviation score formulae for the correlation and covariance coefficients. Since the correlation in deviation scores equals

![]()
From the above expression, the coefficient of covariance may be isolated and redefined as the product of the coefficient of correlation and the variances of its constituent scores;
![]()
![]()
![]()
![]()
can be written as
![]()
The formulae for the coefficient of correlation and formulae for translations between covariance and correlation formulae are summarized as
|
Obtained
Scores |
Deviation
Scores |
Standard
Scores
|
|
|
|
|
|
|
|
|
These formulae are the basic building blocks of the general linear model, capturing the quantitative aspects of relationships between variables. The formulae capturing the relationship between covariance and correlation are shown below.
|
|
Covariance
|
Correlation |
|
Covariance
|
|
|
|
Correlation |
|
|
Covariance and correlation are fundamental tools of statistical analysis, the principal building blocks of the general linear model. They are used in the course of theory development as well as in applied computations. Between those two indices, correlation is more frequently used. The additional properties of the coefficient of correlation will be discussed in detail in the chapters to follow.