Given a set of variables that correlate with each other to a certain degree, it is possible to linearly transform this set of variables so the inter-correlations of variables in this new set are equal to zero. Let us consider two variables X1 and X2 as presented in the following table.
|
|
|
The corresponding correlation matrix R, computed as
|
|
|
is listed in the following table.
|
|
|
The goal of the principal components analysis is to find such a matrix of coefficients B which would transform the data matrix Z
|
|
|
into a matrix Z?
|
|
|
containing variables which are not correlated. This transformation would change the matrix R into an identity matrix
|
|
|
To accomplish this task we have to solve a system of equations, written in a matrix form as
|
|
|
In the above equation, R is the correlation matrix and I is an identity matrix; a matrix having ones in the main diagonal and zeroes in its off diagonal elements. The Greek letter lambda stands for the latent roots called the eigenvalues. The v signifies latent vectors, called the eigenvectors. The above equation, called the characteristic equation, represents a system of homogenous equations in which the number of unknowns is equal to the number of equations. Such a system has a nontrivial solution only if the rank of the coefficient matrix
|
|
|
is smaller than the number of unknowns. This would be the case only if the coefficient matrix is singular, i.e., its determinant
|
|
|
equals to zero.
Let us solve for the characteristic roots (eigenvalues) of the above equation, indicative of the proportions of variance accounted for by each orthogonalized component. For the example the equation

can be simplified as
![]()
Computing the determinant as
![]()
and expanding the binomial
![]()
results in the quadratic equation
![]()
which can be solved for the general form of the quadratic equation
![]()
by the quadratic formula

For the example, the quadratic formula can be construed as
![]()
whence
![]()
Thus, the first eigenvalue equals 1.30 and the second eigenvalue equals .70.
Eigenvalues have several important properties. The property easiest to notice is that the sum of the eigenvalues equals the trace of the correlation matrix
![]()
The trace of a matrix is the sum of the elements located in its principal diagonal. For the example 1.30 + .70 = 2.00. The next property is that the continued product of eigenvalues equals the determinant of the correlation matrix
![]()
For the example, (1.30)(.70) = .91.
Another property is that the proportion of variance accounted for by each principal component equals the ratio of the eigenvalue to the trace of the correlation matrix
![]()
In our example the first eigenvalue accounted for 1.30/2 = .65; i.e., sixty-five percent of the total variance and the second eigenvalue for the remaining .70/2 = .35 thirty-five percent.
Also, it can be observed that the sum of proportions of variances, accounted for by each principal component sums to one
![]()
For the example where the proportion of the variance accounted for by the first component was .65 and the proportion of variance accounted for by the second component was .35, the above assertion obviously holds, as .65 + .35 equals to 1.0.
The extracted eigenvalues substituted into the characteristic equation
![]()
allow us to solved the resulting system of equations for v, the eigenvector associated with each obtained eigenvalue. For our example where the first eigenvalue is 1.30, the above equation can be written as
![]()
by subtracting the eigenvalue 1.30 from the principal diagonal [ 1.00 - 1.30 1.00 - 1.30] of the correlation matrix. Solving the homogenous set of equations in the next step results in a set of linear equations
![]()
![]()
which simplifies to
![]()
having an infinite number of solutions.
For the
second eigenvalue
the eigenvector can be obtained from the
characteristic equation
by subtracting the eigenvalue from the
principal diagonal of the correlation matrix, as
![]()
So
![]()
![]()
and
![]()
Since equations for both eigenvectors are homogenous, the eigenvectors are specified only up to a scale factor. Thus, our solution up to this point is only structural with the eigenvector structure S
![]()
To complete the extraction of eigenvectors we have to impose some constraints on the structural solution obtained so far.
The principal components are extracted in an orthogonal fashion as to maximize the variance of each component. Since it is only the relative values of eigenvectors that are of interest, imposing some restrictions on the structural solution would provide us with a solution that would be optimal with respect to some criteria. Traditionally requited criteria are that the row sums of the squared elements should equal to one and the column sums should equal to eigenvalues, as schematically suggested in the following table.
|
|
|
For the example of a two by two matrix, the solution of the problem is simply to halve the values of the eigenvalues and, subsequently, to take the square roots of these values. However, for a general case of matrices of the order greater than two, an iterative solution is required. Thus, for our example, the matrix conforming to the specified restriction in its initial, squared form is presented in the following table.
Taking the square roots of the elements of the above matrix and consulting the structure matrix S with respect to needed sign results in the matrix of normalized eigenvectors, presented in the following table.
|
|
|
This type of matrix is known as a matrix of factor
loadings. The properties of the above matrix comply with the restrictions
specified above. The column sums of squared eigenvectors are equal to their
corresponding eigenvalues; for the example (.81)2 + (.81)2 =
1.30. and (.59)2 + (-.59)2 = .70. Second, the division of
each squared column sum by the number of factors gives the proportion of total
variance accounted for by each principal component. For the example, the
factorial contribution of the first principal component was (1.30/2 = .65)
sixty-five percent and the second component contributed (.70/2 = .35)
thirty-five percent of the total variance analyzed. Third, the row sums of the
above matrix of factor loadings sum to one. These sums are called
communalities; in the case of principal components the communalities always
equal to one. For the example, the first communality equals
and the second communality equals
.
Factor loadings in the above matrix are correlation of variables with the factors. The original matrix of coefficients of correlation can be obtained from the matrix of factor loadings as
|
|
|
For the example
|
|
|
this property of matrices of factor loadings is fundamental; giving the factor analysis its name.
The second fundamental property of the principal components analysis is that by reversing the order of matrix multiplication of the matrix of factor loading and its transpose
|
|
|
results in the matrix of eigenvalues, L. The matrix L contains the eigenvalues in its principal diagonal and zeroes in its off-diagonal elements. Thus, for the example,
|
|
|
The first (1.30) and second (.30) eigenvalues are indeed located along the principal diagonal of the matrix L.