Principal components analysis is a method designed
to transfer a set of interrelated variables into a
new set of uncorrelated components which account
for all the variance in the original variables.
Apply SPSS Factor Analysis Procedure to obtain
principal components analysis.
SPSS for Windows
A. Enter Data.
B. From the menus choose: Analyze \ Data Reduction \ Factor. Select two
variables (x1 and x2) to be analyzed.
C. Click on Descriptives
In the Correlation Matrix area: Select Coefficients. Click on continue.
D. Click on Extraction. The Extraction dialog box will appear.
![]()
Method: The default is Principal Components.
In the Extract area: Select Number of Factors. Type “2”. Click Continue.
E. Click on Scores. The Factor Scores dialog box will appear.
![]()
Select Save as Variables (Note that Factor 1_1 and Factor 2_1 will be saved.).
Select Display Factor Score Coefficient Matrix.
Click Continue. Click OK.
SPSS Output
1. Correlation Matrix and Communalities
Notice that the original variables are correlated
(r = .30).
The trace of the correlation matrix R equals 1+1=2.
Each variable is standardized to have a variance
of 1. The total variance of the original standardized variables is
equal to 2.
The proportion of variance in a variable explained by the components is called the communality of the variable.
Principal components analysis transfers a set of interrelated variables into a new set of uncorrelated components which account for all the variance in the original variables. Thus, the communality of each original variable is always 1.
2. Eigenvalues
Sum of Eigenvalues
Notice that the sum of the eigenvalues (1.3 + .7 = 2.0) equals the trace of
the correlation matrix R and the total variance of the original standardized
variables is equal to 2. Thus, the total variance of both variables was
analyzed.
The eigenvalue indicates the amount of variance accounted for by each new component. The first principal component explains the maximum variance in all the variables -- a variance of 1.30. The second principal component explains the maximum amount of the remaining variance -- a variance of .70 . The two components explain all the variance in the original variables (1.30 + .70 = 2).
The proportion of variance accounted for by the
first principal component equals .65.
The proportion of variance accounted for by the second principal component
equals .35.
3. Component Score Coefficient Matrix
Each principal component can be expressed as a linear combination of the original variables. The component score coefficients ( weights) are shown below
![]()
4. Factor (Component) Scores
After obtaining the factor score coefficients, we can compute factor scores. The saved factor scores are displayed in the Data Editor Window: Factor 1_1 and Factor 2_1.
![]()
5. Component Matrix
The correlations between the principal component scores and the original variables are shown below
![]()
This type of matrix is known as a matrix of factor (component) loadings.
6. To compute the correlation between the two factor scores (factor 1_1 and factor 2_1), from the menus choose: Analyze \ Correlate \ Bivariate.
Select two variables: factor 1_1 and factor 2_1.
SPSS Output
![]()
The correlation between the two components is zero. Principal components analysis is a method designed to transfer a set of interrelated variables into a new set of uncorrelated components which account for all the variance in the original variables.
A researcher measured 1200 children on the eleven subtests of the Wechsler Preschool and Primary Scale of Intelligence. The eleven subtests are information, vocabulary, arithmetic, similarities, comprehension, sentences, animal house, picture arrangement, mazes, geometric design, and block design.
Suppose that a 11 by 11 matrix of correlation
coefficients (Aiken, 1987, p. 50) was obtained. The researcher then did a
principal components analysis (limit two components extracted to make it simpler
to explain).
Aiken, L. R. (1987). Assessment of
Intellectual Functioning. Newton, Mass.: Allyn & Bacon.
SPSS for Windows
A. Open a New Text File in a Syntax Window: From the menus choose: File \ New \
Syntax.
To learn more about the syntax for the FACTOR procedure, select Help from the
menus. Choose Syntax. From the topic list, click on Factor.
B. Copy or type the following commands.
MATRIX DATA VARIABLES = V1 TO V11
/FILE = INLINE
/FORMAT = FREE LOWER DIAG
/N = 1200
/CONTENTS = CORR.
BEGIN DATA.
1
.60 1
.58 .49 1
.53 .49 .46 1
.60 .57 .51 .55 1
.52 .46 .51 .51 .53 1
.41 .36 .42 .31 .34 .36 1
.47 .45 .42 .36 .42 .35 .38 1
.37 .35 .41 .28 .33 .30 .36 .44 1
.40 .35 .47 .30 .36 .34 .43 .42 .48 1
.43 .38 .50 .35 .39 .38 .38 .45 .46 .48 1
END DATA.
FACTOR MATRIX = IN (COR = *)
/CRITERIA = FACTOR(2)
/EXTRACTION = PC
/ROTATION = VARIMAX
/PRINT = KMO DEFAULT
/FORMAT = SORT
/PLOT = EIGEN ROTATION(1 2).
C. Extraction, rotation, format and plot. (Refer to SPSS User's Guide)
Extraction: Principal components analysis (PC)
is the default. PAF (principal axis factoring) and ML (maximum likelihood) are
some of the keywords.
Rotation: The goal of rotation is to transform
the initial factor matrices into ones that are easier to interpret.
VARIMAX, EQUAMAX, QUARTIMAX, and OBLIMIN
are some of the keywords.
Format: To interpret the factors, we need to
group the original variables that have large factor loadings for the same
factor. Specify the keyword SORT on the
FORMAT subcommand to order the factor loadings by magnitude.
Plot: Use the PLOT subcommand to obtain a scree
plot and a factor loading plot.
A scree plot is a plot of eigenvalue associated with each factor. It is
helpful in determining the number of factors to retain.
A factor loading plot is helpful in determining the clusters of variables and
in determining the success of a rotation. EIGEN
and ROTATION(n1 n2) are the keywords on
the PLOT subcommand. The specifications n1 and n2 refer to the factors used as
the axes.
D. Save the syntax file.
E. Run all the commands in the syntax window: Run \ All.
SPSS Printout
A. Examine the KMO and Bartlett's Test
![]()
Is the strength of the relationship among variables large enough? Is it a good idea to proceed a factor analysis for the data?
1. The Kaiser-Meyer-Olkin measure of sampling adequacy is an index for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients (refer to SPSS User's Guide).
Large values for the KMO measure indicate that a factor analysis of the variables is a good idea. For the example, notice that the Kaiser-Meyer-Olkin measure of sampling adequacy is greater than .90.
2. Another indicator of the strength of the relationship among variables is Bartlett's test of sphericity. Bartlett's test of sphericity is used to test the null hypothesis that the variables in the population correlation matrix are uncorrelated. The observed significance level is .0000. It is small enough to reject the hypothesis.
It is concluded that the strength of the relationship among variables is strong. It is a good idea to proceed a factor analysis for the data.
B. Initial Eigenvalues
![]()
There are 11 original variables in the study. Since each variable is standardized to have a variance of 1, the total variance will be 11. There are as many principal components extracted as original variables initially. The total variance in the new components will be 11, too.
2. Initial Eigenvalues
Algebraically, the principal components analysis is obtained by finding the eigenvalues and eigenvectors (v). The eigenvalue indicates the amount of variance in the original variables accounted for by each component. Division of the eigenvalue by the total variance gives the proportion of the variance extracted.
Component 1 explains a variance of 5.311, which is 48.28% of the total variance of 11.
Component 2 explains a variance of 1.127, which is 10.24% of the total variance of 11.
About 58.52% of the total variance in the 11 standardized variables is attributable to the first two components. The remaining 9 components together account for 41.48% of the total variance.
C. Number of Components to Retain
The principal components analysis may also be used, like factor analysis, to reduce the number of variables in a data set by finding the smallest possible set of principal components which explain most of the variance in the data set.
1. Kaiser suggests only those factors whose eigenvalues are greater than 1 are retained. Two components are to be retained in the study according to his criterion.2. A graphical method called the scree test has been proposed by Cattell.
![]()
1. In this method the magnitude of the eigenvalues (vertical axis) are plotted against their ordinal numbers.
2. The magnitude of successive eigenvalues drops off sharply and then tends to level off. Retain all eigenvalues (and hence components) in the sharp descent before the first one on the line where they start to level off. Examine the scree plot. It appears that a two-component model should be sufficient for the study.
D. Component Matrix
The correlations between the principal component scores and the original variables are shown below
![]()
This type of matrix is known as a matrix of factor (component) loadings.
E. Communality of an Original Variable
The communalities can be calculated from the rows of the component matrix. For the example, approximately ______% of variance in the variable 6 explained by the two-component model.
Compute the communality of Variable 6 as
![]()
The communality of Variable 6 after extraction equals .571.
2. Communalities: Initial vs. Extraction
This table is called Extraction, since it shows the communalities and factor statistics after the desired number of factors has been extracted. In the table labeled Initial Statistics, the communality of V6 is 1. In the table labeled Extraction, the communality of V6 is .571.
![]()
F. Percentage of Variance Accounted For by the Two
Component Model
About ______% of the total variance in the 11
variables is attributable to the first two components.
To judge how well the two-component model describes
the original variables, examine the following table.
![]()
The eigenvalue for Component 1 is 5.311. The eigenvalue for component 2 is 1.127. The amount of variance accounted for by the two components is 6.438. About 59% of the total variance in the 11 variables is attributable to the first two components. (6.438 / 11 = .5852)
G. The components are artificial variables and are not necessarily to be
interpretable. The purpose of rotation is to achieve a simple structure.
The varimax rotation enhances the interpretability of the principal components or factors. With the varimax rotation, each component correlates high with a smaller number of variables and low on the other variables.
The researcher's job is to identify and give a name to each component or factor by examining the clusters of variables.
![]()
Components and Associated VariablesGroup the original variables that have large factor loadings for the same factor.
In the table labeled Rotated Factor Matrix, the first component loaded high and positive on variables V5, V4, V1, V6, V2 and V3. The second component loaded high and positive on variables V10, V9, V11, V8 and V7.Interpret the Components
What do those variables have in common?
1. All of the first six subtests have loadings above .50 on component I. The subtests are information, vocabulary, arithmetic, similarities, comprehension, and sentences. Thus, the first component can be labeled a verbal-educational component.
2. All of the last five subtests have loadings above .60 on component II. The subtest are animal house, picture arrangement, mazes, geometric design, and block design. Thus, the second component can be labeled a spatial-perceptual component.
H. Component Loading Plot
What to Look For
1. Variables at the end of an axis: Those variables have high loadings on that factor.
2. Variables near the origin of the plot (0,0): Those variables have low loadings on both factors.
The first component loaded high and positive on variables V5, V4, V1, V6, V2 and V3. The second component loaded high and positive on variables V10, V9, V11, V8 and V7.
I. Properties of the matrix of component loadings.
1. Compute the amount of variance accounted for by the two unrotated components.
![]()
Eigenvalues can be calculated from the columns of the component matrix.
Summing the variance of each variable attributable to Component 1 (the column sums of squared elements).
![]()
The eigenvalue for Component 1 is 5.311.Summing the variance of each variable attributable to Component 2 (the column sums of squared elements). The eigenvalue for Component 2 is 1.13.
The amount of variance accounted for by the two components is 6.438. 5.31 +1.13 = 6.44
2. Compute the amount of variance accounted for by the two rotated components.
![]()
Summing the variance of each variable attributable to rotated Component 1 (the column sums of squared elements). The eigenvalue for rotated Component 1 is 3.47. Summing the variance of each variable attributable to rotated Component 2 (the column sums of squared elements). The eigenvalue for rotated Component 2 is 2.97.
The amount of variance accounted for by the two components is 6.44. 3.47 + 2.97 = 6.44
3. What do you observe?
The total amount of variance accounted for by the two unrotated components is the same as that accounted for by the two varimax rotated components. However, the variance accounted for by the varimax rotated components is spread out more evenly than for the unrotated components.