Q: How to create simulated data (to be added to a given dataset), which jointly satisfy a given correlation-matrix. Abstract: a problem solved using the matrix-program MatMate (© with the author). The listing of the output is documented to make the process transparent. Gottfried Helms, 16.5.2007

;===========================================

;MatMate-Listing vom:16.05.2007 14:09:28

;============================================

##### First: show the data

X is empirical data in 9 variables, n=100 observations, observations of a variable run along a row

X :

 x1 0.417 -0.088 -1.065 0.574 2.137 0.093 ... x2 0.427 -1.157 -1.409 0.857 1.29 -0.51 ... x3 0.826 0.54 -0.886 0.982 1.596 0.633 ... x4 0.579 -0.228 -1.5 0.205 1.219 0.331 ... x5 0.808 0.151 -1.647 0.409 1.23 0.197 ... x6 0.432 0.381 -1.995 1.92 0.877 -0.547 ... x7 0.054 0.052 -1.01 0.646 1.656 0.408 ... x8 0.546 1.089 -1.335 1.599 1.694 0.154 ... x9 0.611 -0.269 -1.813 0.752 1.238 0.607 ...

R is correlation-matrix 12x12. The top-left 9x9-matrix are the empirical correlations for X, the remaining entries the required correlations for simulated data x10..x12.

R :

 1 0.77 0.76 0.73 0.75 0.59 0.86 0.83 0.7 0.8 0.66 0.67 0.77 1 0.57 0.87 0.68 0.72 0.81 0.71 0.67 0.87 0.83 0.82 0.76 0.57 1 0.54 0.81 0.63 0.82 0.86 0.85 0.77 0.57 0.56 0.73 0.87 0.54 1 0.77 0.66 0.83 0.71 0.72 0.77 0.76 0.86 0.75 0.68 0.81 0.77 1 0.64 0.88 0.81 0.88 0.87 0.77 0.75 0.59 0.72 0.63 0.66 0.64 1 0.58 0.86 0.74 0.62 0.79 0.73 0.86 0.81 0.82 0.83 0.88 0.58 1 0.82 0.83 0.91 0.7 0.82 0.83 0.71 0.86 0.71 0.81 0.86 0.82 1 0.82 0.77 0.77 0.74 0.7 0.67 0.85 0.72 0.88 0.74 0.83 0.82 1 0.78 0.74 0.8 0.8 0.87 0.77 0.77 0.87 0.62 0.91 0.77 0.78 1 0.84 0.77 0.66 0.83 0.57 0.76 0.77 0.79 0.7 0.77 0.74 0.84 1 0.86 0.67 0.82 0.56 0.86 0.75 0.73 0.82 0.74 0.8 0.77 0.86 1

##### Second: Computations.

LAD is the cholesky-loadingsmatrix 12x12 computed from R. The top-left 9x9-matrix are the loadings of empirical data on the orthogonal components (whose scores will have to be computed later), the remaining entries the required loadings for simulated data x10..x12.

 x1 1 . . . . . . . . . . . x2 0.77 0.63 . . . . . . . . . . x3 0.76 -0.03 0.65 . . . . . . . . . x4 0.73 0.48 -0.00 0.49 . . . . . . . . x5 0.75 0.15 0.37 0.31 0.42 . . . . . . . x6 0.59 0.42 0.30 0.05 0.00 0.62 . . . . . . x7 0.86 0.22 0.26 0.21 0.09 -0.18 0.25 . . . . . x8 0.83 0.10 0.35 0.13 0.00 0.35 0.10 0.17 . . . . x9 0.7 0.20 0.50 0.23 0.15 0.12 0.08 -0.19 0.30 . . . x10 0.8 0.40 0.25 -0.00 0.27 -0.16 0.06 0.13 0.04 0.10 . . x11 0.66 0.51 0.12 0.07 0.31 0.23 -0.03 0.22 0.23 -0.00 0.19 . x12 0.67 0.48 0.09 0.29 0.11 0.14 0.27 -0.04 0.20 -0.09 0.26 0.07

Here the factor-/component-scores for the 9 orthogonal components, which define the 9 empirical datarows x1..x9, are computed by inversion of the top-left 9x9 loadingsmatrix.

The first 6 columns in FAC of the factorscores for f1..f9:

FAC :

 fac1 0.417 -0.088 -1.065 0.574 2.137 0.093 ... fac2 0.165 -1.721 -0.923 0.653 -0.575 -0.92 ... fac3 0.789 0.864 -0.155 0.867 -0.063 0.827 ... fac4 0.406 1.358 -0.577 -1.066 -0.122 1.445 ... fac5 0.115 -0.652 -1.121 -0.262 -0.546 -1.188 ... fac6 -0.226 1.342 -1.45 1.79 -0.193 -0.862 ... fac7 -2.745 1.184 0.473 1.39 -0.004 -0.11 ... fac8 1.109 1.558 1.33 0.753 0.495 0.017 ... fac9 0.818 -1.543 -0.325 -0.374 0.417 0.974 ...

Now to be able to define 3 new components uncorrelated to the existing ones, we rotate the matrix of factor-scores to diagonal position.

[8] // get the transformation-matrix, which would rotate

[9] // fac into triangular/diagonal position

[10] t = gettrans(fac,"drei")

[11] // do the rotation actually; fac1 has then a diagonal in the top-left

[12] fac1 = fac*t

Since the 9 orthogonal components are uncorrelated, we get a diagonal matrix in the top-left 9x9-segment

FAC1:

 fac1 1 10 0 0 0 0 0 0 0 0 0 0 0 ... fac1 2 0 10 0 0 0 0 0 0 0 0 0 0 ... fac1 3 0 0 10 0 0 0 0 0 0 0 0 0 ... fac1 4 0 0 0 10 0 0 0 0 0 0 0 0 ... fac1 5 0 0 0 0 10 0 0 0 0 0 0 0 ... fac1 6 0 0 0 0 0 10 0 0 0 0 0 0 ... fac1 7 0 0 0 0 0 0 10 0 0 0 0 0 ... fac1 8 0 0 0 0 0 0 0 10 0 0 0 0 ... fac1 9 0 0 0 0 0 0 0 0 10 0 0 0 ...

To generate three more uncorrelated components means, to add three more rows and insert the value sqrt(n) = 10 in the appropriate diagonal position:

[16] fac1 = {fac1,null(3,n)}              // first simply 3 null-rows

[17] fac1[10..12,10..12]=fac1[1..3,1..3]  // insert diagonal-values(simply copy from top-left

Fac1 :

 fac1 1 10 0 0 0 0 0 0 0 0 0 0 0 ... fac1 2 0 10 0 0 0 0 0 0 0 0 0 0 ... fac1 3 0 0 10 0 0 0 0 0 0 0 0 0 ... fac1 4 0 0 0 10 0 0 0 0 0 0 0 0 ... fac1 5 0 0 0 0 10 0 0 0 0 0 0 0 ... fac1 6 0 0 0 0 0 10 0 0 0 0 0 0 ... fac1 7 0 0 0 0 0 0 10 0 0 0 0 0 ... fac1 8 0 0 0 0 0 0 0 10 0 0 0 0 ... fac1 9 0 0 0 0 0 0 0 0 10 0 0 0 ... fac1 10 0 0 0 0 0 0 0 0 0 10 0 0 ... fac1 11 0 0 0 0 0 0 0 0 0 0 10 0 ... fac1 12 0 0 0 0 0 0 0 0 0 0 0 10 ...

Since we have the rotation-matrix (T) saved, which performed the rotation of FAC into this diagonal-position, we use its transpose, to rerotate FAC1 , which now includes three new and uncorrelated components:

[24] // rotate fac+newdata back to original position (recreating fac)

[25] newfac = fac1 * t'

[26] // newfac 1..9 are identical to the original values

[27] // the newdata newfac10..12 are still uncorrelated to newfac1..9

NEWFAC :

 newfac1 0.417 -0.088 -1.065 0.574 2.137 0.093 ... newfac2 0.165 -1.721 -0.923 0.653 -0.575 -0.92 ... newfac3 0.789 0.864 -0.155 0.867 -0.063 0.827 ... newfac4 0.406 1.358 -0.577 -1.066 -0.122 1.445 ... newfac5 0.115 -0.652 -1.121 -0.262 -0.546 -1.188 ... newfac6 -0.226 1.342 -1.45 1.79 -0.193 -0.862 ... newfac7 -2.745 1.184 0.473 1.39 -0.004 -0.11 ... newfac8 1.109 1.558 1.33 0.753 0.495 0.017 ... newfac9 0.818 -1.543 -0.325 -0.374 0.417 0.974 ... newfac10 0.141 -0.097 -0.041 -0.106 -0.01 0.05 ... newfac11 0.421 -0.022 0.408 0.084 0.118 0.087 ... newfac12 0.284 -0.309 0.093 -0.367 0.117 0.177 ...

Note, that the data of newfac1..9 are identical to the original fac1..9.

Now we have 12 uncorrelated components and the 12x12 cholesky-loadingsmatrix, which defines the composition of variables according to their loadings on the components. We (re-)compute the dataset

[29] // create correlated data, based on the cholesky matrix and new factors

[30] y = lad * newfac

Y      disp :

 y1 0.417 -0.088 -1.065 0.574 2.137 0.093 ... y2 0.427 -1.157 -1.409 0.857 1.29 -0.51 ... y3 0.826 0.54 -0.886 0.982 1.596 0.633 ... y4 0.579 -0.228 -1.5 0.205 1.219 0.331 ... y5 0.808 0.151 -1.647 0.409 1.23 0.197 ... y6 0.432 0.381 -1.995 1.92 0.877 -0.547 ... y7 0.054 0.052 -1.01 0.646 1.656 0.408 ... y8 0.546 1.089 -1.335 1.599 1.694 0.154 ... y9 0.611 -0.269 -1.813 0.752 1.238 0.607 ... y10 0.703 -0.718 -1.134 0.738 1.443 -0.218 ... y11 1.067 -0.683 -1.646 1.123 1.117 -0.523 ... y12 0.023 -0.352 -1.531 0.971 1.141 0.064 ...

##### Last step:

Check the correlations of the new dataset and the differences to the original correlations in R. The differences should all be zero

[32] chk = y *'/n  - r

corr(Y) - R :

 chkr1 0 0 0 0 0 0 ... chkr2 0 0 0 0 0 0 ... chkr3 0 0 0 0 0 0 ... chkr4 0 0 0 0 0 0 ... chkr5 0 0 0 0 0 0 ... chkr6 0 0 0 0 0 0 ... chkr7 0 0 0 0 0 0 ... chkr8 0 0 0 0 0 0 ... chkr9 0 0 0 0 0 0 ... chkr10 0 0 0 0 0 0 ... chkr11 0 0 0 0 0 0 ... chkr12 0 0 0 0 0 0 ...

(MatMate (beta) can be obtained for academic educational institutions without charge,
see http://go.helms-net.de/sw/matmate)

Gottfried Helms, 16.5.2007