Some insight into the difference of Anova in R and SPSS default-procedure,
and between the "SSType(1)" and "SSType(3)" options
An example with purely metric data

Gottfried Helms
5'2016

In this paper I show the reason, why in R and in SPSS, using the defaults, the procedure ANOVA gives different results. When I first got in touch with this problem it was difficult for me to understand this from the software-descriptions - on one hand, they were much too concise and on the other hand I found some discussions about the SS-type problem with a multitude of little-understood topics involved; so I decided to try to understand the matters by reengineering the matrix-mathematics of the Anova-procedure myself. Here I provide a third extended version motivated by some questions in the discussion forum "stats.stackexchange.com".

For the presentation here I've (re-) analyzed the outputs of both software with some data and different models using my matrix-software MatMate, found the key for the basic and technical understanding, and think have now a nice and concise systematic scheme compatible with other concepts like the matrix-model and variance-decomposition of the linear model.

The methods available in MatMate allowed to reengineer the mathematical procedures and to find some reasonable conceptual background. The differences can nicely be seen in terms of triangular "loadings"-matrices and their column-rotations similarly as known from principal components analysis, where the analogon to cholesky-factorization of covariance-matrices occurs; and even the extension to the computation of the regression-coefficients B could be made from this basic tools.

1. Data

From the book "Multivariate Analysemethoden" of K. Backhaus [1990] I took the (metric) data example in the chapter on regression (pg. 6).

The setting is an analysis of the amount of sales of some item in various filials dependent on price, the practice of sending advisors to the shops and the investition of some marketing-propaganda. The items are Absatzmenge or Absatz for number of sold items per shop; Preis for price, Vertreter for the number of advisors' visits, VerkFoerd for the financial amount for some marketing propaganda.

To have the option to put the constant somewhere in the (hierarchical) Anova-model I've also added a variable const with the constant value "1" which requires then in SPSS to use the "Unianova"-procedure with "/intercept=exclude". In the MatMate-procedure this is happily no problem to configure arbitrarily and only in R it is not possible to move the const from the first place in the (typically hierarchical) model to some position in the list of items.

Of course, the data are all metric, but I think the situation is in principle the same when a model mixed with different scale-types or even only nominal-scaled factors and interactions is tested. Nominal-scaled factors can in the Anova-procedures be coded by dummy-variables which allow then again the metric procedures as used here, so I think the reduction for the simple case of metric items should still be meaningful for this small analysis of the basic structure of the procedure-implementations. Interaction-effects can easily be added: just compute the new interaction-item as product of the scores of the involved items; I've checked some examples based on this data using MatMate and SPSS.

The used data are as follows

Readable for MatMate

labels = {"const", "preis","VerkFoerd","Vertreter","Absatz"}

data = { _

{ 1.00, 12.50, 2000.00, 109.00, 2298.00}, _

{ 1.00, 10.00, 550.00, 107.00, 1814.00}, _

{ 1.00, 9.95, 1000.00, 99.00, 1647.00}, _

{ 1.00, 11.50, 800.00, 70.00, 1496.00}, _

{ 1.00, 12.00, 0.00, 81.00, 969.00}, _

{ 1.00, 10.00, 1500.00, 102.00, 1918.00}, _

{ 1.00, 8.00, 800.00, 110.00, 1810.00}, _

{ 1.00, 9.00, 1200.00, 92.00, 1896.00}, _

{ 1.00, 9.50, 1100.00, 87.00, 1715.00}, _

{ 1.00, 12.50, 1300.00, 79.00, 1699.00} }

Readable as CSV-data

; "const", "preis", "VerkFoerd", "Vertreter", "Absatz"

1.00, 12.50, 2000.00, 109.00, 2298.00

1.00, 10.00, 550.00, 107.00, 1814.00

1.00, 9.95, 1000.00, 99.00, 1647.00

1.00, 11.50, 800.00, 70.00, 1496.00

1.00, 12.00, 0.00, 81.00, 969.00

1.00, 10.00, 1500.00, 102.00, 1918.00

1.00, 8.00, 800.00, 110.00, 1810.00

1.00, 9.00, 1200.00, 92.00, 1896.00

1.00, 9.50, 1100.00, 87.00, 1715.00

1.00, 12.50, 1300.00, 79.00, 1699.00

Because we want to analyse the Sum-of-Squares we need not standardize the data, and because the data contain the constant item const we even don't need to recenter the items - we can just use them in their original values.

2. SSqr- and CoProduct-matrix

First we compute the matrix CoProd which is an analogon of the covariance-matrix between the items and contains, for instance, in the diagonal the Sum-of-squares of each item. It makes the following matrix-formulae easier that we have the dependent item at the end of the list/at the bottom of the CoProd-matrix. This is simply done by the MatMate-command "CoProd = data ' * data " and gives the following matrix:

CoProd	const	preis	VerkFoerd	Vertreter	Absatz
const	10.000	104.950	10250.000	936.000	17262.000
preis	104.950	1123.003	108550.000	9736.550	180338.650
VerkFoerd	10250.000	108550.000	13172500.000	981650.000	19132900.000
Vertreter	936.000	9736.550	981650.000	89370.000	1643436.000
Absatz	17262.000	180338.650	19132900.000	1643436.000	30838452.000

(MatMate:)
CoProd = data' * data

In the diagonal occur the SSq ("Sum-of-Squares") for the items. In the off-diagonal entries are the sums of the crossproducts of the data. SPSS documents for Absatz (when dependend) the value for the Sum-of-Squares type SSType(1) "Gesamt : 30838452.000" which is exactly the number in the diagonal for Absatz in the above table. (Unfortunately, in the Unianova-procedure the Sums-of-Squares for the other items are not printed.)

3.1. Partial Sums-of-Squares for model with item-order: Const, Preis, VerkFoerd, Vertreter

Interestingly, it is possible to apply the mechanisms of the cholesky-decomposition and rotations as in PCA for finding the explaining partial sums-of-squares. If we do a cholesky decomposition for the CoProd-matrix then we find:

"Loadings" for (re-) ordered Anova; Model: Absatz <- (const) Preis, Verkfoerd, Vertreter

PL	[const]	[preis]	[VerkFoerd]	[Vertreter]	[Absatz]
const	3.162	.	.	.	.
preis	33.188	4.642	.	.	.
VerkFoerd	3241.335	210.288	1619.268	.	.
Vertreter	295.989	-18.691	16.168	33.907	.
Absatz	5458.724	-177.932	911.997	284.368	310.685

(MatMate:)
PL = cholesky(CoProd)

In PCA these "loadings" were the coordinates in the generated euclidean (orthogonal) factor-space. The brackets in the column-headers shall indicate, that the columns are "partialled" up to that specific item - they are not names for the "coordinates"-axes!

After we have in table PL so-to-say "loadings", we need now Sums-of-Squares. Similarly as in PCA the partial covariances are simply the squared components loadings, we have now the "Partial Sums-of-Squares" as squares of the "loadings":

Partial Sums-of-Squares

PSSq	[const]	[preis]	[VerkFoerd]	[Vertreter]	[Absatz]
const	10.000	.	.	.	.
preis	1101.450	21.552	.	.	.
VerkFoerd	1050625.	44221.094	2622028.906	.	.
Vertreter	87609.600	349.339	261.406	1149.655	.
Absatz	29797664.400	31659.900	831737.936	80864.894	96524.870

Yellow and blue marked entry documented by R, blue marked entry documented by SPSS with SS III

(MatMate:)
PSSq = PL ^# 2 // compute squares elementwise

And in the row for the item Absatz we have already the Anova-SSq for the model with hierarchical list "absatz <- const preis VerkFoerd Vertreter" .

Software R:

To compare this result with that of the anova-procedure in the software R we find the yellow and blue marked entries in the row for the item Absatz; they represent the so-to-say "partially explained Sums-of-Squares". Note that in the R-procedure the entry for [const] is not displayed! The command was

RegModel.7 <- lm(absatzmenge~1+preis+verkfoerd+vertreter, data=Backhaus_Regression)

anova(RegModel.7)

getting

Response: absatzmenge
tb	Df	SumSq	MeanSq	F_value	Pr(>F)
preis	1	31660	31660	1.9680	0.210232
verkfoerd	1	831738	831738	51.7010	0.000366
vertreter	1	80865	80865	5.0266	0.066165
Residuals	6	96525	16087

Unfortunately, the constant contribution was not displayed.

Software SPSS:

When using type SSType(1), the UniAnova-procedure gives us the same values:

Tests der Zwischensubjekteffekte
Abhängige Variable:	Absatz
Quelle	Quadratsumme vom Typ I	df	Mittel der Quadrate	F	Sig.
Korrigiertes Modell	944262.73	3	314754.243	19.565	.002
Konstanter Term	29797664.400	1	29797664.400	1852.227	.000
Preis	31659.900	1	31659.900	1.968	.210
VerkFoerd	831737.936	1	831737.936	51.701	.000
Vertreter	80864.894	1	80864.894	5.027	.066
Fehler	96524.870	6	16087.478
Gesamt	30838452.000	10
Korrigierte Gesamtvariation	1040787.600	9

UNIANOVA Absatz WITH Preis VerkFoerd Vertreter
/METHOD=SSTYPE(1) /INTERCEPT=INCLUDE /CRITERIA=ALPHA(0.05) /DESIGN=Preis VerkFoerd Vertreter .

In my first encounter with the Anova-procedure it has been much irritating that SPSS with type SSType(3) gave differing output in the leading three items as documented here in orange - it seemed, as if the method of computation/ the concept for the sums-of-squares (selected by the SSType()) were different and not only the type of collections/presentation of them from a larger set of possible partial coefficients as it is actually the case; I'll show this below.

Tests der Zwischensubjekteffekte
Abhängige Variable:	Absatz
Quelle	Quadratsumme vom Typ III	df	Mittel der Quadrate	F	Sig.
Korrigiertes Modell	944262.73	3	314754.243	19.565	.002
Konstanter Term	26178.826	1	26178.826	1.627	.249
Preis	10687.148	1	10687.148	.664	.446
VerkFoerd	491123.992	1	491123.992	30.528	.001
Vertreter	80864.894	1	80864.894	5.027	.066
Fehler	96524.870	6	16087.478
Gesamt	30838452.000	10
Korrigierte Gesamtvariation	1040787.600	9

UNIANOVA Absatzmenge WITH Preis VerkFoerd Vertreter
/METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /CRITERIA=ALPHA(0.05) /DESIGN=Preis VerkFoerd Vertreter .

The key is here, that one could understand this table as a collection of the relevant partial sums-of-squares of four different SS I-procedures. Consider four models, where always another item is at the end of the list:

"Unianova (...) /Design = const, Preis, VerkFoerd, Vertreter (...)",
"Unianova (...) /Design = Vertreter, const, Preis, VerkFoerd (...)",
"Unianova (...) /Design = VerkFoerd, Vertreter, const, Preis (...)"
"Unianova (...) /Design = Preis, VerkFoerd, Vertreter, const (...)"

and for each of this analyses using SStype(1), then the documented partial sums-of-squares in the above table occur always as the partial sum-of-squares of the last item in the list: the SStype(1) procedure uses a hierarchical model for the decomposition and the SStype(3) procedure documents the values for each item being the last one in the list.

Remark: this coefficients are also analoguous to the concept of "Usefulness" in regression, see the remark in "3.5 Overview" for a couple of references.

The entry Fehler with value 96524.870 is the "unexplained sum-of-squares" in the dependent item, which we also find -unsurprisingly- in the above table PSSq- in the column [Absatz] as residual of the regression-like sums-of-squares-decomposition.

In the following I document the differently re-ordered models to discover all partial sums-of-squares for the various models (which are collected in a single output of the SPSS-procedure for SStype(3)). The "loadings" matrices PL are rotated versions of each other only and serve so far only as source for the matrices PSSq of partial sums-of-squares. That latter ones contain simply the squares of the "loadings" and provide our interesting sets of coefficients. (So the "loadings"-matrices are greyed out in the following to focus more on the partial sums-of-squares tables).

3.2. Partial Sums-of-Squares for model with item-order: Vertreter, Const,Preis,VerkFoerd,

First we rotate the PL-matrix such that the previous variable VerkFoerd has an itemspecific "loading" on the 4'th axis. (In R this would mean to redefine the formula for the Anova-model). The "loadings" are in the grey area at the rhs, the sums-of-squares (which are simply the squares of the "loadings") are in the main table at the left side.

Partial Sums-of-Squares

PSSq	[Vertreter]	[const]	[preis]	[VerkFoerd]	[Absatz]
const	9.803	0.197	.	.	.
preis	1060.763	44.964	17.275	.	.
VerkFoerd	10782552.562	4919.035	248743.030	2136285.373	.
Vertreter	89370.000	.	.	.	.
Absatz	30221348.172	12580.307	16874.660	491123.992	96524.870

Yellow and blue marked entry documented by R, blue marked entry documented by SPSS with SS III

"Loadings" for re-ordered Anova Model

PL	[Vertreter]	[const]	[preis]	[VerkFoerd]	[Absatz]
const	3.131	0.444	.	.	.
preis	32.569	6.706	4.156	.	.
VerkFoerd	3283.680	-70.136	498.741	1461.604	.
Vertreter	298.948	.	.	.	.
Absatz	5497.395	112.162	129.903	700.802	310.685

(MatMate:)
PL = rot(PL,"drei",4´1´2´3,1..4)

3.3. Partial Sums-of-Squares for model with item-order: VerkFoerd, Vertreter, Const, Preis

Again we rotate the PL-matrix; now such that the previous variable Preis has an itemspecific "loading" on the 4'th axis.

Partial Sums-of-Squares

PSSq	[VerkFoerd]	[Vertreter]	[const]	[preis]	[Absatz]
const	7.976	1.828	0.197	.	.
preis	894.523	167.315	45.691	15.474	.
VerkFoerd	13172500.000	.	.	.	.
Vertreter	73155.189	16214.811	.	.	.
Absatz	27790310.299	2920182.204	20747.480	10687.148	96524.870

Yellow and blue marked entry documented by R, blue marked entry documented by SPSS with SS III

"Loadings" for re-ordered Anova Model

PL	[VerkFoerd]	[Vertreter]	[const]	[preis]	[Absatz]
const	2.824	1.352	0.443	.	.
preis	29.909	12.935	6.760	3.934	.
VerkFoerd	3629.394	.	.	.	.
Vertreter	270.472	127.337	.	.	.
Absatz	5271.652	1708.854	144.040	-103.379	310.685

(MatMate:)
PL = rot(PL,"drei",3´4´1´2,1..4)

3.4. Partial Sums-of-Squares for model with item-order: Preis, VerkFoerd, Vertreter, Const

Partial Sums-of-Squares

PSSq	[preis]	[VerkFoerd]	[Vertreter]	[const]	[Absatz]
const	9.808	0.004	0.138	0.050	.
preis	1123.003	.	.	.	.
VerkFoerd	10492498.904	2680001.096	.	.	.
Vertreter	84416.914	612.338	4340.748	.	.
Absatz	28959889.834	1079973.646	675884.824	26178.826	96524.870

Yellow and blue marked entry documented by R, blue marked entry documented by SPSS with SS III

"Loadings" for re-ordered Anova Model

PL	[preis]	[VerkFoerd]	[Vertreter]	[const]	[Absatz]
const	3.132	0.064	0.372	0.223	.
preis	33.511	.	.	.	.
VerkFoerd	3239.213	1637.071	.	.	.
Vertreter	290.546	24.745	65.884	.	.
Absatz	5381.439	1039.218	822.122	161.799	310.685

(MatMate:)
PL = rot(PL,"drei",2´3´4´1,1..4)

3.5. Partial Sums-of-Squares -overview-

To have the set of coefficients of R and SPSS together we copy all rows of the PSSq which contain the Absatz- partial sum-of-squares; the models were defined with hierarchies according to the items-order and always Absatz as dependent.

Model 1	[const]	[preis]	[VerkFoerd]	[Vertreter]	[Absatz]
Absatz (SPSS SS(1), R)	29797664.400	31659.900	831737.936	80864.894	96524.870

Model 2	[Vertreter]	[const]	[preis]	[VerkFoerd]	[Absatz]
Absatz (SPSS SS(1))	30221348.172	12580.307	16874.660	491123.992	96524.870

Model 3	[VerkFoerd]	[Vertreter]	[const]	[preis]	[Absatz]
Absatz (SPSS SS(1))	27790310.299	2920182.204	20747.480	10687.148	96524.870

Model 4	[preis]	[VerkFoerd]	[Vertreter]	[const]	[Absatz]
Absatz (SPSS SS(1))	28959889.834	1079973.646	675884.824	26178.826	96524.870

(Remark: for model2 to model4 I was unable to configure the R-command accordingly; the results were always as if the const was at the first place in the list, and was also not displayed. I've crosschecked the results with MatMate anyway finding that this was the only problem)

All models	[const]	[preis]	[VerkFoerd]	[Vertreter]	[Absatz]
SPSS SS(3)	26178.826	10687.148	491123.992	80864.894	96524.870

(Remark: With option SStype(3) SPSS documents the list of coefficients as collection of the hierarchical SStype(1) results (marked with orange color))

The solution for R (model 1) is now the first row of coefficients without the [const], and that of SPSS SStype(1) the whole first row and SPSS SSType(3) the fourth column:

Sums of squares as documented by R(anova) and SPSS(unianova):

Model 1	R	SPSS SS(1)	SPSS SS(3)
const	?	29797664.400	26178.826
preis	31659.900	31659.900	10687.148
VerkFoerd	831737.936	831737.936	491123.992
Vertreter	80864.894	80864.894	80864.894
Residual	96524.870	96524.870	96524.870

Conclusion: The "Anova"-procedure in R and SSType(1) in SPSS gives us the set of explained partial sum-of-squares (of the dependent item) organized in hierarchical order. That order implicitely defined by the textual order of items in the command for the procedure.

In SPSS the position of the constant in the hierarchy can be modified if there is one constant data-item included, called for instance "const", and in the unianova-command the implicite computation of the coefficient for the constant is deactivated by the option /origin=exclude . In R, the model seem to contain the constant always in the first position of the hierarchy and there seems to be no similar workaround.

The construct with the "loadings"-matrices PL has the interesting aspect, that the direction of the influence of some item on the dependent can be seen. So we have in the first model in 3.1. with the item Preis a partial "loading" with negative value, which shows a negative relation of Absatz with Preis when const is partialled out. The according partial sum-of-squares in PSSq however is of course positive and by that coefficient alone one would not see that information.

A furtherly interesting aspect might here be, that the SPSS SSType(3)-default gives us the set of coefficients which are also analoguous to the "Usefulnesses" known from the Regression-procedure (and also their F-value and p-value). The usefulness-coefficient in Regression seems to be rarely discussed - it is not even in the Wikipedia; I found it for instance mentioned in the 1999 book "Statistik für Sozialwissenschaftler" by J. Bortz, pg 442 ("Nützlichkeit") referring to an idea of R. B. Darlington (1968). It is also described in a more recent script by M. Persike, 2008, pg. 6.

4. Regression

Finally in this sequence we compute the Regression-coefficients B for the items using the inversion of the upper-left submatrix of PL. This gives us the columns in the metric of the predictors (see the 1.000-coordinates in their columns). Here the order of the items become irrelevant because each item gets an own axis attached in which the dependent item can be measured.

Note that the vectorspace has then non-orthogonal axes
(the items which provide the metric are correlated)

To check this using SPSS we compare the entries in row "Absatz" with the coefficients given by SPSS. Remark: the "const" was given as variable to have additional options for the output . Of course, having a "constant" included, the option "/ORIGIN must then be applied. :

B	const	preis	VerkFoerd	Vertreter	Absatz
const	1.000	.	.	.	.
preis	.	1.000	.	.	.
VerkFoerd	.	.	1.000	.	.
Vertreter	.	.	.	1.000	.
Absatz	725.548	-26.281	0.479	8.387	982.471

(Matmate:)

PLInv = inv(PL[1..4,1..4])

PLInv = insert(PLInv,{sqrt(N)})

B = PL * PLInv

Nicht standardisierte Koeffizienten

	Regressionskoeffizient B
const	725.548
Preis	-26.281
VerkFoerd	0.479
Vertreter	8.387

REGRESSION /ORIGIN /DEPENDENT Absatzmenge

/METHOD=ENTER const Preis VerkFoerd Vertreter .

5. References

Backhaus Multivariate Analysemethoden,
K. Backhaus, B. Erichson, W. Plinke, R.Weiber
Springer, Berlin; 1990, 6. Auflage

Bortz Statistik für Sozialwissenschaftler
J. Bortz
Springer, Berlin; 1999, 5. Auflage

Darlington Multiple Regression in psychological research and practice
Psychol. Bull. 69, 1968, pg. 161-182
(referred to by J. Bortz, pg 442)

Persike Forschungsstatistik I
M. Persike
2008, Skript zur Vorlesung
http://methodenlehre.sowi.uni-mainz.de/download/Lehre/SS2009/StatistikII/VL_2009_05_12.pdf

Wikipedia Regression Analysis
(multiple authors)
https://en.wikipedia.org/wiki/Regression_analysis

SPSS IBM Spss, V. 21 German

R The R-project

MatMate a Matrix-calculator for statistical education and selfstudy
G. Helms, 1996 last update 2016
http://go.helms-net.de/sw/matmate/index.htm

(c) Gottfried Helms, Univ. Kassel, 5'2016, Version 3.5

Some insight into the difference of Anova in R and SPSS default-procedure, and between the "SSType(1)" and "SSType(3)" options An example with purely metric data

Some insight into the difference of Anova in R and SPSS default-procedure,
and between the "SSType(1)" and "SSType(3)" options
An example with purely metric data