Principal components analysis is based on the work of Jacobi (1804-1851) pertaining to the theory of determinants. Jacobi was often saying invert, always invert (man muss immer umkehren) for he believed that the solution of many problems can be obtained by expressing them in inverse form.
Eigenanalysis of a set of variables partitions their variance into principal components. There are as many principal components as the number of variables analyzed. When interpreting correlation matrices, a need arose to simplify them to a smaller number. The attempts to simplify a matrix of correlations were spearheaded by the work of Spearman (1863-1945), who observed a special property of a matrix of correlations which he called the tetrad differences, which lead to his method of finding of a general factor surrounded by several smaller specific factors. The theory of multiple factor analysis was outlined by Thurstone (1887-1955) in his article in the 1931 issue of Psychological Review which was he later expanded into the book The Vectors of Mind. In 1936 Roff published an article in Psychometrika (Some properties of the communality in multiple factor theory) where he suggested that a reduction in size of the size of the matrix of principal components can be obtained by substituting the communality of variables for their total variance analyzed by the principal components analysis.
The communality is the amount of variance each variable in the analysis shares with other variables. The computation of communalities was a laborious process of successively designating each variable as the criterion variable, the remaining variables as the predictor variables, and carrying on a series of k multiple regressions where k is the number of variables included in the analysis. The communality was thus a scalar matrix with the squared coefficients of multiple correlations in its principal diagonal.
A better procedure than carrying out a series of k multiple regression analyses is suggested by the equation
|
|
|
where I is an identity matrix and R is the matrix of correlations between variables included in the analysis. The subscript on the left side of the above equation symbolizes that C is a diagonal matrix with communalities located along its principal diagonal.
The central part of the above equation is the computation of inverse of a correlation matrix. For a simple case of two variables
|
|
|
the correlation matrix is
|
|
|
The variables correlate .30 and the coefficient of determination is .09. The determinant of this correlation matrix is
|
|
|
Toe the example the determinant is .91 and determines whether the matrix is invertible. Note that in the case we invert a matrix of correlations the determinant is equal to the coefficient of alienation. The inverse of the matrix R equals
|
|
|
Diagonalizing the above matrix leads to
|
|
|
Recall that a number to a zero power equals 1 and to the -1 power equals its reciprocal. Thus
|
|
|
Thus the setting the inverse of a correlation matrix to the -1 power changes the elements in the principal diagonal to the coefficients of alienation. Since the coefficient of determination are one's complements of the coefficients of alienation, subtracting the above matrix from an identity matrix
|
|
|
results in the matrix of communalities
|
|
|
This works for any number of variables. For our previous example of a multiple regression analysis
|
|
|
the matrix of correlations was
|
|
|
Its inverse equals
|
|
|
The matrix of communalities is
|
|
|
Notice that the previously computed coefficient of multiple determination, for the case of our example of the X1 and X2 regressed on Y, was indeed .58.