"Very-different-results-of-principal-component-analysis-in-spss-and-stata"
A comment to a question in SSE (stats.stackexchange.com)
Initial (PC)-solution |
|
|
|
Stata - Principal components (with Varimax on Kaiser-normalization) |
|
SPSS - Principal components (with Varimax on Kaiser-normalization) |
Stata- "principal-component factors" (with Varimax on Kaiser-normalization) (seems to equal the SPSS-analysis) |
working on eigenvectors |
|
working on principal components (scaled eigenvectors) |
working on principal components (scaled eigenvectors) |
----------------------------- Component | Eigenvalue -------------+--------------- Comp1 | 3.8723 Comp2 | 1.40682 Comp3 | 1.1791 Comp4 | 0.972359 Comp5 | 0.803195 Comp6 | 0.752324 Comp7 | 0.656957 Comp8 | 0.643198 Comp9 | 0.507304 Comp10 | 0.463711 Comp11 | 0.388806 Comp12 | 0.353931 -----------------------------
|
|
|
---------------------------- Factor | Eigenvalue -------------+-------------- Factor1 | 3.87230 Factor2 | 1.40682 Factor3 | 1.17910 Factor4 | 0.97236 Factor5 | 0.80319 Factor6 | 0.75232 Factor7 | 0.65696 Factor8 | 0.64320 Factor9 | 0.50730 Factor10 | 0.46371 Factor11 | 0.38881 Factor12 | 0.35393 ----------------------------
|
---------------------------------------------------------- Variable | Comp1 Comp2 Comp3 | Unexplained -------------+------------------------------+------------- bewert_sfu_a | 0.2700 0.3901 -0.1477 | 0.4779 bewert_sfu_b | 0.3298 0.2303 -0.4027 | 0.3129 bewert_sfu_c | -0.3046 0.3149 0.1773 | 0.4642 bewert_sfu_d | 0.3489 0.1910 0.0700 | 0.4715 bewert_sfu_e | 0.3342 0.2067 0.2720 | 0.4202 bewert_sfu_f | -0.2001 0.4561 -0.1587 | 0.5227 bewert_sfu_g | 0.3057 0.3128 0.1531 | 0.4728 bewert_sfu_h | -0.3611 0.2180 0.2913 | 0.328 bewert_sfu_i | 0.2352 -0.2211 0.3662 | 0.5588 bewert_sfu_j | -0.1556 0.3894 0.4578 | 0.4457 bewert_sfu_k | 0.3239 0.0525 0.0754 | 0.5832 bewert_sfu_l | 0.2091 -0.2445 0.4720 | 0.4839 ---------------------------------------------------------- |
|
Component |Extraction 1 2 3 |(=explained) -----------------------------------------+------------ bewert_sfu_a 0.531 0.463 | 0.522 bewert_sfu_b 0.649 -0.437 | 0.687 bewert_sfu_c -0.599 | 0.536 bewert_sfu_d 0.687 | 0.529 bewert_sfu_e 0.658 | 0.580 bewert_sfu_f 0.541 | 0.477 bewert_sfu_g 0.602 | 0.527 bewert_sfu_h -0.711 | 0.672 bewert_sfu_i 0.463 | 0.441 bewert_sfu_j 0.462 0.497 | 0.554 bewert_sfu_k 0.637 | 0.417 bewert_sfu_l 0.412 0.513 | 0.516
|
----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- bewert_sfu_a | 0.5314 0.4627 -0.1603 | 0.4779 bewert_sfu_b | 0.6490 0.2732 -0.4373 | 0.3129 bewert_sfu_c | -0.5994 0.3735 0.1926 | 0.4642 bewert_sfu_d | 0.6866 0.2265 0.0760 | 0.4715 bewert_sfu_e | 0.6576 0.2451 0.2954 | 0.4202 bewert_sfu_f | -0.3938 0.5409 -0.1723 | 0.5227 bewert_sfu_g | 0.6015 0.3710 0.1663 | 0.4728 bewert_sfu_h | -0.7107 0.2586 0.3163 | 0.3280 bewert_sfu_i | 0.4629 -0.2622 0.3977 | 0.5588 bewert_sfu_j | -0.3062 0.4619 0.4971 | 0.4457 bewert_sfu_k | 0.6373 0.0623 0.0818 | 0.5832 bewert_sfu_l | 0.4116 -0.2900 0.5125 | 0.4839
|
After Varimax-rotation |
|
|
|
---------------------------- Component | Variance -------------+-------------- Comp1 | 2.95242 Comp2 | 2.08506 Comp3 | 1.42073 ----------------------------
|
|
|
---------------------------- Factor | Variance -------------+-------------- Factor1 | 2.84986 Factor2 | 1.86281 Factor3 | 1.74554 ----------------------------
|
---------------------------------------------------------- Variable | Comp1 Comp2 Comp3 | Unexplained -------------+------------------------------+------------- bewert_sfu_a | 0.4076 -0.0266 -0.2829 | 0.4779 bewert_sfu_b | 0.3116 -0.3063 -0.3648 | 0.3129 bewert_sfu_c | -0.0255 0.4536 -0.1302 | 0.4642 bewert_sfu_d | 0.4007 -0.0456 0.0218 | 0.4715 bewert_sfu_e | 0.4392 0.0965 0.1618 | 0.4202 bewert_sfu_f | 0.0698 0.2650 -0.4451 | 0.5227 bewert_sfu_g | 0.4531 0.0973 0.0005 | 0.4728 bewert_sfu_h | -0.1026 0.5023 0.0011 | 0.328 bewert_sfu_i | 0.1350 -0.0261 0.4684 | 0.5588 bewert_sfu_j | 0.1927 0.5856 0.0731 | 0.4457 bewert_sfu_k | 0.3026 -0.1048 0.1037 | 0.5832 bewert_sfu_l | 0.1224 0.0410 0.5564 | 0.4839 ---------------------------------------------------------- |
|
Component 1 2 3 ------------------------------------- bewert_sfu_a 0.705 bewert_sfu_b 0.673 -0.448 bewert_sfu_c 0.627 bewert_sfu_d 0.671 bewert_sfu_e 0.661 bewert_sfu_f -0.576 bewert_sfu_g 0.699 bewert_sfu_h 0.698 bewert_sfu_i 0.630 bewert_sfu_j 0.742 bewert_sfu_k 0.528 bewert_sfu_l 0.707 ------------------------------------- |
----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- bewert_sfu_a | 0.7047 -0.0983 -0.1258 | 0.4779 bewert_sfu_b | 0.6732 -0.4479 -0.1827 | 0.3129 bewert_sfu_c | -0.2184 0.6266 -0.3090 | 0.4642 bewert_sfu_d | 0.6710 -0.1473 0.2377 | 0.4715 bewert_sfu_e | 0.6605 0.0245 0.3781 | 0.4202 bewert_sfu_f | 0.0474 0.3785 -0.5761 | 0.5227 bewert_sfu_g | 0.6989 0.0358 0.1935 | 0.4728 bewert_sfu_h | -0.3776 0.6976 -0.2067 | 0.3280 bewert_sfu_i | 0.1847 -0.1019 0.6298 | 0.5588 bewert_sfu_j | 0.0624 0.7419 -0.0018 | 0.4457 bewert_sfu_k | 0.5276 -0.2131 0.3050 | 0.5832 bewert_sfu_l | 0.1273 -0.0160 0.7069 | 0.4839 ----------------------------------------------------------- |
Transformation-/Rotation-matrix PCA-> Varimax |
|
|
|
-------------------------------------------- | Comp1 Comp2 Comp3 -------------+------------------------------ Comp1 | 0.7942 -0.5573 0.2422 Comp2 | 0.5724 0.5523 -0.6061 Comp3 | 0.2040 0.6200 0.7576 -------------------------------------------- |
|
-------------------------------------------- Component 1 2 3 -------------------------------------------- 1 0.765 -0.476 0.434 2 0.644 0.567 -0.513 3 -0.001 0.672 0.741 |
----------------------------------------- | Factor1 Factor2 Factor3 -------------+--------------------------- Factor1 | 0.7650 -0.4761 0.4336 Factor2 | 0.6440 0.5672 -0.5134 Factor3 | -0.0016 0.6720 0.7406 ----------------------------------------- |
Conclusion: It appears, that the second Stata-solution equals the SPSS-solution
Here is a script (in my matrix-calculator language "MatMate") which tries to reproduce, how the first Stata solution and the SPSS-/second Stata- solutions are calculated.
Indeed the first solution seems to be -different from the solution by SPSS- a transformation on the eigenvectors (and more specifically: on their Kaiser-normalization over the first 3 eigenvectors) rather than on the PCA-components like in SPSS.
Here is my calculation using my matrix-calculator-software MatMate
;****** MatMate Version 0.1410 Beta *****************************
// first part of your posted data : Eigenvectors according to Stata- computations (4 decimals; taken via clipboard)
clp = csvdatei("clip")
eig_lad = clp[*,1..3] // the first 3 columns: these are //obviously eigenvector-values
| 0.2700 0.3901 -0.1477 | | 0.3298 0.2303 -0.4027 | | -0.3046 0.3149 0.1773 | | 0.3489 0.1910 0.0700 | | 0.3342 0.2067 0.2720 | | -0.2001 0.4561 -0.1587 | | 0.3057 0.3128 0.1531 | | -0.3611 0.2180 0.2913 | | 0.2352 -0.2211 0.3662 | | -0.1556 0.3894 0.4578 | | 0.3239 0.0525 0.0754 | | 0.2091 -0.2445 0.4720 |
|
uniq = clp[*,4] // "not-explained" = unique // (unexplained by 3 eigenvectors) variances // (= squared values, not loadings)
| 0.4779 | | 0.3129 | | 0.4642 | | 0.4715 | | 0.4202 | | 0.5227 | | 0.4728 | | 0.3280 | | 0.5588 | | 0.4457 | | 0.5832 | | 0.4839 |
|
// second part of posted data: first three eigenvalues (taken via clipboard)
pca_ssl = csvdatei("clip") // "ssl" means "sum of squares of loadings"
| 3.8723 1.4068 1.1791 |
// check whether the "unique" variance is really the not-explained variance by the first 3 eigenvectors/PCA-components:
chk = sumzl( eig_lad ^# 2 *# pca_ssl ) + uniq
// the squared pca-loadings (eigenvectors^2 scaled by eigenvalues) plus the unique variance should sum up to 1-variance for each row (=item)
| 1.0000 |
| 0.9999 |
| 1.0000 |
| 1.0000 |
| 1.0000 |
| 1.0001 |
| 1.0000 |
| 0.9998 |
| 0.9999 |
| 0.9999 |
| 1.0000 |
| 1.0000 |
Important conclusion: obviously, in this implementation "unique" has noting to do with the "itemspecific unique variance" in common factor analysis but it simply the (correlated) residual variance after the three first eigenvectors (!)
// get the rotationmatrix to bring the "Kaiser"-normalized loadings to Varimax
// note that SPSS computes this based on the PCA-loadings, not on Eigenvector-values
t = gettrans(normzl(eig_lad ),"varimax") // "normzl(<loadings>)"
// provides Kaiser-normalization per row
// (of course here in Stata based on eigenvectors)
// rotation-/transformation-matrix "t" from pca to varimax coordinates
| 0.7942 -0.5573 0.2421 |
| 0.5724 0.5523 -0.6062 |
| 0.2041 0.6200 0.7576 |
vmx_lad = eig_lad * t // this computes the Stata - varimax-coordinates
| 0.4580 -0.0065 -0.1926 |
| 0.4012 -0.2970 -0.2735 |
| -0.0305 0.4428 -0.1625 |
| 0.3898 -0.0107 0.1051 |
| 0.3884 0.1409 0.2402 |
| 0.1409 0.2473 -0.4385 |
| 0.4351 0.1348 0.0853 |
| -0.1363 0.4913 -0.0528 |
| 0.0370 0.0087 0.4867 |
| 0.1310 0.6026 0.0716 |
| 0.2815 -0.0738 0.1693 |
| 0.0018 0.0790 0.5657 |
vmx_ssl = sqsumsp((pca_ssl *# sqrt(pca_ssl#)) *t)
// sums of squares of the varimax-rotated princ. components (not eigenvectors!)
| 2.9522 2.0849 1.4207 |
The values found here are obviously identical to that of the first documented Stata-solution, so we should assume, that this is also the implementation of Stata's concept.
// =============== The SPSS-solution =========================
eig_lad = clp[*,1..3] // the first 3 columns: these are obviously eigenvector-values
spss_pca_lad = eig_lad *# sqrt(pca_ssl#) // compute pca-loadings from eigenvectors
spss_t = gettrans(normzl(spss_pca_lad ),"varimax") // "normzl(<pca loadings>)"
spss_vmx_lad = spss_pca_lad * spss_t // this computes the SPSS - varimax-coordinates
| 0.7047 -0.0983 -0.1260 |
| 0.6731 -0.4479 -0.1827 |
| -0.2184 0.6266 -0.3091 |
| 0.6710 -0.1473 0.2377 |
| 0.6606 0.0244 0.3780 |
| 0.0474 0.3785 -0.5761 |
| 0.6989 0.0358 0.1935 |
| -0.3776 0.6975 -0.2066 |
| 0.1846 -0.1019 0.6298 |
| 0.0624 0.7418 -0.0018 |
| 0.5276 -0.2131 0.3050 |
| 0.1273 -0.0160 0.7068 |
spss_vmx_ssl = sqsumsp(spss_vmx_lad) // the SPSS-"variances" of the vmx-factors by sums-of-squares along the columns
| 2.8498 1.8626 1.7455 |
Since all coordinates and also the transformation/rotationmatrix seem to be reproduced correctly it seems this is indeed the internal computation of SPSS
==============================================================================
Now the difference condensed to the syntax/concept would be
// Stata
eig_lad = clp[*,1..3] // the first 3 columns: these are obviously eigenvector-values
t = gettrans(normzl(eig_lad ),"varimax") // "normzl(<on eigenvectors>)"
vmx_lad = eig_lad * t // this computes the Stata - varimax-coordinates
// SPSS
eig_lad = clp[*,1..3] // the first 3 columns: these are obviously eigenvector-values
spss_pca_lad = eig_lad *# sqrt(pca_ssl#) // compute pca-loadings from eigenvectors
spss_t = gettrans(normzl(spss_pca_lad ),"varimax") // "normzl(<on pca loadings>)"
spss_vmx_lad = spss_pca_lad * spss_t // this computes the SPSS - varimax-coordinates
The rotation-criterion for the VARIMAX-concept seems to be different in both software-packets.
While Stata computes the rotation-angles based on the unit-variance-normalized ("Kaiser-normalized") rows of the eigenvectors , does SPSS compute that rotation-angles based on the unit-variance-normalized ("Kaiser-normalized") rows of PCA-components, which are scalings of the eigenvectors by the square-roots of associated eigenvalues.
This should -in most cases - result in different solutions.
Gottfried Helms, 3.6.2015