"Very-different-results-of-principal-component-analysis-in-spss-and-stata"
A comment to a question in SSE (stats.stackexchange.com)

see: http://stats.stackexchange.com/questions/154378/very-different-results-of-principal-component-analysis-in-spss-and-stata-after-r

 

 

Initial (PC)-solution

 

 

 

Stata - Principal components (with Varimax on Kaiser-normalization)

 

SPSS - Principal components (with Varimax on Kaiser-normalization)

Stata- "principal-component factors" (with Varimax on Kaiser-normalization) (seems to equal the SPSS-analysis)

working on eigenvectors

 

working on principal components (scaled eigenvectors)

working on principal components (scaled eigenvectors)

-----------------------------

   Component |   Eigenvalue 

-------------+---------------

       Comp1 |       3.8723 

       Comp2 |      1.40682 

       Comp3 |       1.1791 

       Comp4 |     0.972359 

       Comp5 |     0.803195 

       Comp6 |     0.752324 

       Comp7 |     0.656957 

       Comp8 |     0.643198 

       Comp9 |     0.507304 

      Comp10 |     0.463711 

      Comp11 |     0.388806 

      Comp12 |     0.353931 

-----------------------------

 

 

 

 

----------------------------

     Factor  |   Eigenvalue

-------------+--------------

    Factor1  |      3.87230

    Factor2  |      1.40682

    Factor3  |      1.17910

    Factor4  |      0.97236

    Factor5  |      0.80319

    Factor6  |      0.75232

    Factor7  |      0.65696

    Factor8  |      0.64320

    Factor9  |      0.50730

   Factor10  |      0.46371

   Factor11  |      0.38881

   Factor12  |      0.35393

----------------------------

 

----------------------------------------------------------

    Variable |    Comp1     Comp2     Comp3 | Unexplained

-------------+------------------------------+-------------

bewert_sfu_a |   0.2700    0.3901   -0.1477 |      0.4779

bewert_sfu_b |   0.3298    0.2303   -0.4027 |      0.3129

bewert_sfu_c |  -0.3046    0.3149    0.1773 |      0.4642

bewert_sfu_d |   0.3489    0.1910    0.0700 |      0.4715

bewert_sfu_e |   0.3342    0.2067    0.2720 |      0.4202

bewert_sfu_f |  -0.2001    0.4561   -0.1587 |      0.5227

bewert_sfu_g |   0.3057    0.3128    0.1531 |      0.4728

bewert_sfu_h |  -0.3611    0.2180    0.2913 |      0.328

bewert_sfu_i |   0.2352   -0.2211    0.3662 |      0.5588

bewert_sfu_j |  -0.1556    0.3894    0.4578 |      0.4457

bewert_sfu_k |   0.3239    0.0525    0.0754 |      0.5832

bewert_sfu_l |   0.2091   -0.2445    0.4720 |      0.4839

----------------------------------------------------------

 

                       Component         |Extraction  

                   1       2       3     |(=explained)

-----------------------------------------+------------

bewert_sfu_a   0.531   0.463             |   0.522   

bewert_sfu_b   0.649          -0.437     |   0.687   

bewert_sfu_c  -0.599                     |   0.536   

bewert_sfu_d   0.687                     |   0.529   

bewert_sfu_e   0.658                     |   0.580   

bewert_sfu_f           0.541             |   0.477   

bewert_sfu_g   0.602                     |   0.527   

bewert_sfu_h  -0.711                     |   0.672   

bewert_sfu_i   0.463                     |   0.441   

bewert_sfu_j           0.462   0.497     |   0.554   

bewert_sfu_k   0.637                     |   0.417   

bewert_sfu_l   0.412           0.513     |   0.516   

 

-----------------------------------------------------------

    Variable |  Factor1   Factor2   Factor3 |   Uniqueness

-------------+------------------------------+--------------

bewert_sfu_a |   0.5314    0.4627   -0.1603 |      0.4779 

bewert_sfu_b |   0.6490    0.2732   -0.4373 |      0.3129 

bewert_sfu_c |  -0.5994    0.3735    0.1926 |      0.4642 

bewert_sfu_d |   0.6866    0.2265    0.0760 |      0.4715 

bewert_sfu_e |   0.6576    0.2451    0.2954 |      0.4202 

bewert_sfu_f |  -0.3938    0.5409   -0.1723 |      0.5227 

bewert_sfu_g |   0.6015    0.3710    0.1663 |      0.4728 

bewert_sfu_h |  -0.7107    0.2586    0.3163 |      0.3280 

bewert_sfu_i |   0.4629   -0.2622    0.3977 |      0.5588 

bewert_sfu_j |  -0.3062    0.4619    0.4971 |      0.4457  

bewert_sfu_k |   0.6373    0.0623    0.0818 |      0.5832 

bewert_sfu_l |   0.4116   -0.2900    0.5125 |      0.4839 

 

After Varimax-rotation

 

 

 

----------------------------

   Component |     Variance

-------------+--------------

       Comp1 |      2.95242

       Comp2 |      2.08506

       Comp3 |      1.42073

----------------------------

 

 

 

 

----------------------------

     Factor  |     Variance

-------------+--------------

    Factor1  |      2.84986

    Factor2  |      1.86281

    Factor3  |      1.74554

----------------------------

 

----------------------------------------------------------

    Variable |    Comp1     Comp2     Comp3 | Unexplained

-------------+------------------------------+-------------

bewert_sfu_a |   0.4076   -0.0266   -0.2829 |      0.4779

bewert_sfu_b |   0.3116   -0.3063   -0.3648 |      0.3129

bewert_sfu_c |  -0.0255    0.4536   -0.1302 |      0.4642

bewert_sfu_d |   0.4007   -0.0456    0.0218 |      0.4715

bewert_sfu_e |   0.4392    0.0965    0.1618 |      0.4202

bewert_sfu_f |   0.0698    0.2650   -0.4451 |      0.5227

bewert_sfu_g |   0.4531    0.0973    0.0005 |      0.4728

bewert_sfu_h |  -0.1026    0.5023    0.0011 |      0.328

bewert_sfu_i |   0.1350   -0.0261    0.4684 |      0.5588

bewert_sfu_j |   0.1927    0.5856    0.0731 |      0.4457

bewert_sfu_k |   0.3026   -0.1048    0.1037 |      0.5832

bewert_sfu_l |   0.1224    0.0410    0.5564 |      0.4839

----------------------------------------------------------

 

                      Component     

                   1      2      3  

-------------------------------------

bewert_sfu_a   0.705                

bewert_sfu_b   0.673 -0.448         

bewert_sfu_c          0.627         

bewert_sfu_d   0.671                

bewert_sfu_e   0.661                

bewert_sfu_f                -0.576  

bewert_sfu_g   0.699                

bewert_sfu_h          0.698         

bewert_sfu_i                 0.630  

bewert_sfu_j          0.742         

bewert_sfu_k   0.528                

bewert_sfu_l                 0.707  

-------------------------------------

-----------------------------------------------------------

    Variable |  Factor1   Factor2   Factor3 |   Uniqueness

-------------+------------------------------+--------------

bewert_sfu_a |   0.7047   -0.0983   -0.1258 |      0.4779 

bewert_sfu_b |   0.6732   -0.4479   -0.1827 |      0.3129 

bewert_sfu_c |  -0.2184    0.6266   -0.3090 |      0.4642 

bewert_sfu_d |   0.6710   -0.1473    0.2377 |      0.4715 

bewert_sfu_e |   0.6605    0.0245    0.3781 |      0.4202 

bewert_sfu_f |   0.0474    0.3785   -0.5761 |      0.5227 

bewert_sfu_g |   0.6989    0.0358    0.1935 |      0.4728 

bewert_sfu_h |  -0.3776    0.6976   -0.2067 |      0.3280 

bewert_sfu_i |   0.1847   -0.1019    0.6298 |      0.5588 

bewert_sfu_j |   0.0624    0.7419   -0.0018 |      0.4457 

bewert_sfu_k |   0.5276   -0.2131    0.3050 |      0.5832 

bewert_sfu_l |   0.1273   -0.0160    0.7069 |      0.4839 

-----------------------------------------------------------

Transformation-/Rotation-matrix PCA-> Varimax

 

 

 

--------------------------------------------

             |    Comp1     Comp2     Comp3

-------------+------------------------------

       Comp1 |   0.7942   -0.5573    0.2422

       Comp2 |   0.5724    0.5523   -0.6061

       Comp3 |   0.2040    0.6200    0.7576

--------------------------------------------

 

--------------------------------------------

Component      1        2      3

--------------------------------------------

1          0.765   -0.476  0.434

2          0.644    0.567 -0.513

3         -0.001    0.672  0.741

-----------------------------------------

             | Factor1  Factor2  Factor3

-------------+---------------------------

     Factor1 |  0.7650  -0.4761   0.4336

     Factor2 |  0.6440   0.5672  -0.5134

     Factor3 | -0.0016   0.6720   0.7406

-----------------------------------------

 

Conclusion: It appears, that the second Stata-solution equals the SPSS-solution


Here is a script (in my matrix-calculator language "MatMate") which tries to reproduce, how the first Stata solution and the SPSS-/second Stata- solutions are calculated.

Indeed the first solution seems to be -different from the solution by SPSS- a transformation on the eigenvectors (and more specifically: on their Kaiser-normalization over the first 3 eigenvectors) rather than on the PCA-components like in SPSS.

 

Here is my calculation using my matrix-calculator-software MatMate

 

 

    ;******  MatMate Version 0.1410 Beta *****************************

    //  first part of your posted data : Eigenvectors according to Stata- computations (4 decimals; taken via clipboard)

    clp = csvdatei("clip")

 

    eig_lad = clp[*,1..3]   // the first 3 columns: these are

                            //obviously eigenvector-values

 

   

           |   0.2700    0.3901   -0.1477 |

           |   0.3298    0.2303   -0.4027 |

           |  -0.3046    0.3149    0.1773 |

           |   0.3489    0.1910    0.0700 |

           |   0.3342    0.2067    0.2720 |

           |  -0.2001    0.4561   -0.1587 |

           |   0.3057    0.3128    0.1531 |

           |  -0.3611    0.2180    0.2913 |

           |   0.2352   -0.2211    0.3662 |

           |  -0.1556    0.3894    0.4578 |

           |   0.3239    0.0525    0.0754 |

           |   0.2091   -0.2445    0.4720 |

 

    uniq = clp[*,4]  // "not-explained" = unique

             // (unexplained by 3 eigenvectors) variances

             //  (= squared values, not loadings)

   

           |   0.4779 |

           |   0.3129 |

           |   0.4642 |

           |   0.4715 |

           |   0.4202 |

           |   0.5227 |

           |   0.4728 |

           |   0.3280 |

           |   0.5588 |

           |   0.4457 |

           |   0.5832 |

           |   0.4839 |

 

 

   

    //  second part of posted data: first three eigenvalues (taken via clipboard)

    pca_ssl = csvdatei("clip")  // "ssl" means "sum of squares of loadings"

   

           |   3.8723    1.4068    1.1791 |

 

 

       // check whether the "unique" variance is really the not-explained variance by the first 3 eigenvectors/PCA-components:

    chk = sumzl(    eig_lad ^# 2 *#  pca_ssl   ) + uniq

       // the squared pca-loadings (eigenvectors^2 scaled by eigenvalues) plus the unique variance should sum up to 1-variance for each row (=item)

   

           |   1.0000 |

           |   0.9999 |

           |   1.0000 |

           |   1.0000 |

           |   1.0000 |

           |   1.0001 |

           |   1.0000 |

           |   0.9998 |

           |   0.9999 |

           |   0.9999 |

           |   1.0000 |

           |   1.0000 |

 

Important conclusion: obviously, in this implementation "unique" has noting to do with the "itemspecific unique variance" in common factor analysis but it simply the (correlated) residual variance after the three first eigenvectors (!)

   

     // get the rotationmatrix to bring the "Kaiser"-normalized loadings to Varimax

     // note that SPSS computes this based on the PCA-loadings, not on Eigenvector-values

    t = gettrans(normzl(eig_lad ),"varimax") // "normzl(<loadings>)"

                                             // provides Kaiser-normalization per row

      //       (of course here in Stata based on eigenvectors)

   

     // rotation-/transformation-matrix "t" from pca to varimax coordinates

   

           |   0.7942   -0.5573    0.2421 |

           |   0.5724    0.5523   -0.6062 |

           |   0.2041    0.6200    0.7576 |

   

    vmx_lad = eig_lad * t        // this computes the Stata - varimax-coordinates

   

           |   0.4580   -0.0065   -0.1926 |

           |   0.4012   -0.2970   -0.2735 |

           |  -0.0305    0.4428   -0.1625 |

           |   0.3898   -0.0107    0.1051 |

           |   0.3884    0.1409    0.2402 |

           |   0.1409    0.2473   -0.4385 |

           |   0.4351    0.1348    0.0853 |

           |  -0.1363    0.4913   -0.0528 |

           |   0.0370    0.0087    0.4867 |

           |   0.1310    0.6026    0.0716 |

           |   0.2815   -0.0738    0.1693 |

           |   0.0018    0.0790    0.5657 |

   

    vmx_ssl = sqsumsp((pca_ssl *# sqrt(pca_ssl#)) *t)

              // sums of squares of the varimax-rotated princ. components (not eigenvectors!)

   

           |   2.9522    2.0849    1.4207 |

    

The values found here are obviously identical to that of the first documented Stata-solution, so we should assume, that this is also the implementation of Stata's concept.

 

    // =============== The SPSS-solution          =========================

    eig_lad = clp[*,1..3]   // the first 3 columns: these are obviously eigenvector-values

    spss_pca_lad = eig_lad *# sqrt(pca_ssl#)            // compute pca-loadings from eigenvectors

    spss_t = gettrans(normzl(spss_pca_lad ),"varimax")  // "normzl(<pca loadings>)"

    spss_vmx_lad = spss_pca_lad * spss_t                // this computes the SPSS - varimax-coordinates

 

       |   0.7047   -0.0983   -0.1260 |

       |   0.6731   -0.4479   -0.1827 |

       |  -0.2184    0.6266   -0.3091 |

       |   0.6710   -0.1473    0.2377 |

       |   0.6606    0.0244    0.3780 |

       |   0.0474    0.3785   -0.5761 |

       |   0.6989    0.0358    0.1935 |

       |  -0.3776    0.6975   -0.2066 |

       |   0.1846   -0.1019    0.6298 |

       |   0.0624    0.7418   -0.0018 |

       |   0.5276   -0.2131    0.3050 |

       |   0.1273   -0.0160    0.7068 |

 

       spss_vmx_ssl = sqsumsp(spss_vmx_lad)        // the SPSS-"variances" of the vmx-factors by sums-of-squares along the columns

 

       |   2.8498    1.8626    1.7455 |

 

Since all coordinates and also the transformation/rotationmatrix seem to be reproduced correctly it seems this is indeed the internal computation of SPSS

==============================================================================

Now the difference condensed to the syntax/concept would be                 

    // Stata

    eig_lad = clp[*,1..3]   // the first 3 columns: these are obviously eigenvector-values

    t = gettrans(normzl(eig_lad ),"varimax") // "normzl(<on eigenvectors>)"

    vmx_lad = eig_lad * t        // this computes the Stata - varimax-coordinates

 

    // SPSS

    eig_lad = clp[*,1..3]   // the first 3 columns: these are obviously eigenvector-values

    spss_pca_lad = eig_lad *# sqrt(pca_ssl#) // compute pca-loadings from eigenvectors

    spss_t = gettrans(normzl(spss_pca_lad ),"varimax") // "normzl(<on pca loadings>)"

    spss_vmx_lad = spss_pca_lad * spss_t        // this computes the SPSS - varimax-coordinates

 

The rotation-criterion for the VARIMAX-concept seems to be different in both software-packets.

While Stata computes the rotation-angles based on the unit-variance-normalized ("Kaiser-normalized") rows of the eigenvectors , does SPSS compute that rotation-angles based on the unit-variance-normalized ("Kaiser-normalized") rows of PCA-components, which are scalings of the eigenvectors by the square-roots of associated eigenvalues.

This should -in most cases - result in different solutions.


Gottfried Helms, 3.6.2015