Q: How to create simulated data Abstract: a problem solved using the matrix-program MatMate (© with the author). The listing of the output is documented to make the process transparent. Gottfried Helms, 16.5.2007 |
|
;===========================================
;MatMate-Listing vom:16.05.2007 14:09:28
;============================================
X is empirical data in 9 variables, n=100 observations, observations of a variable run along a row
X :
x1 |
0.417 |
-0.088 |
-1.065 |
0.574 |
2.137 |
0.093 |
... |
x2 |
0.427 |
-1.157 |
-1.409 |
0.857 |
1.290 |
-0.510 |
... |
x3 |
0.826 |
0.540 |
-0.886 |
0.982 |
1.596 |
0.633 |
... |
x4 |
0.579 |
-0.228 |
-1.500 |
0.205 |
1.219 |
0.331 |
... |
x5 |
0.808 |
0.151 |
-1.647 |
0.409 |
1.230 |
0.197 |
... |
x6 |
0.432 |
0.381 |
-1.995 |
1.920 |
0.877 |
-0.547 |
... |
x7 |
0.054 |
0.052 |
-1.010 |
0.646 |
1.656 |
0.408 |
... |
x8 |
0.546 |
1.089 |
-1.335 |
1.599 |
1.694 |
0.154 |
... |
x9 |
0.611 |
-0.269 |
-1.813 |
0.752 |
1.238 |
0.607 |
... |
R is correlation-matrix 12x12. The top-left 9x9-matrix are the empirical correlations for X, the remaining entries the required correlations for simulated data x10..x12.
R :
1.00 |
0.77 |
0.76 |
0.73 |
0.75 |
0.59 |
0.86 |
0.83 |
0.70 |
0.80 |
0.66 |
0.67 |
0.77 |
1.00 |
0.57 |
0.87 |
0.68 |
0.72 |
0.81 |
0.71 |
0.67 |
0.87 |
0.83 |
0.82 |
0.76 |
0.57 |
1.00 |
0.54 |
0.81 |
0.63 |
0.82 |
0.86 |
0.85 |
0.77 |
0.57 |
0.56 |
0.73 |
0.87 |
0.54 |
1.00 |
0.77 |
0.66 |
0.83 |
0.71 |
0.72 |
0.77 |
0.76 |
0.86 |
0.75 |
0.68 |
0.81 |
0.77 |
1.00 |
0.64 |
0.88 |
0.81 |
0.88 |
0.87 |
0.77 |
0.75 |
0.59 |
0.72 |
0.63 |
0.66 |
0.64 |
1.00 |
0.58 |
0.86 |
0.74 |
0.62 |
0.79 |
0.73 |
0.86 |
0.81 |
0.82 |
0.83 |
0.88 |
0.58 |
1.00 |
0.82 |
0.83 |
0.91 |
0.70 |
0.82 |
0.83 |
0.71 |
0.86 |
0.71 |
0.81 |
0.86 |
0.82 |
1.00 |
0.82 |
0.77 |
0.77 |
0.74 |
0.70 |
0.67 |
0.85 |
0.72 |
0.88 |
0.74 |
0.83 |
0.82 |
1.00 |
0.78 |
0.74 |
0.80 |
0.80 |
0.87 |
0.77 |
0.77 |
0.87 |
0.62 |
0.91 |
0.77 |
0.78 |
1.00 |
0.84 |
0.77 |
0.66 |
0.83 |
0.57 |
0.76 |
0.77 |
0.79 |
0.70 |
0.77 |
0.74 |
0.84 |
1.00 |
0.86 |
0.67 |
0.82 |
0.56 |
0.86 |
0.75 |
0.73 |
0.82 |
0.74 |
0.80 |
0.77 |
0.86 |
1.00 |
Compute cholesky-loadingsmatrix from R.
[3] lad = cholesky(r)
LAD is the cholesky-loadingsmatrix 12x12 computed from R. The top-left 9x9-matrix are the loadings of empirical data on the orthogonal components (whose scores will have to be computed later), the remaining entries the required loadings for simulated data x10..x12.
lad :
x1 |
1.00 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
x2 |
0.77 |
0.63 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
x3 |
0.76 |
-0.03 |
0.65 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
x4 |
0.73 |
0.48 |
-0.00 |
0.49 |
. |
. |
. |
. |
. |
. |
. |
. |
x5 |
0.75 |
0.15 |
0.37 |
0.31 |
0.42 |
. |
. |
. |
. |
. |
. |
. |
x6 |
0.59 |
0.42 |
0.30 |
0.05 |
0.00 |
0.62 |
. |
. |
. |
. |
. |
. |
x7 |
0.86 |
0.22 |
0.26 |
0.21 |
0.09 |
-0.18 |
0.25 |
. |
. |
. |
. |
. |
x8 |
0.83 |
0.10 |
0.35 |
0.13 |
0.00 |
0.35 |
0.10 |
0.17 |
. |
. |
. |
. |
x9 |
0.70 |
0.20 |
0.50 |
0.23 |
0.15 |
0.12 |
0.08 |
-0.19 |
0.30 |
. |
. |
. |
x10 |
0.80 |
0.40 |
0.25 |
-0.00 |
0.27 |
-0.16 |
0.06 |
0.13 |
0.04 |
0.10 |
. |
. |
x11 |
0.66 |
0.51 |
0.12 |
0.07 |
0.31 |
0.23 |
-0.03 |
0.22 |
0.23 |
-0.00 |
0.19 |
. |
x12 |
0.67 |
0.48 |
0.09 |
0.29 |
0.11 |
0.14 |
0.27 |
-0.04 |
0.20 |
-0.09 |
0.26 |
0.07 |
Here the factor-/component-scores for the 9 orthogonal components, which define the 9 empirical datarows x1..x9, are computed by inversion of the top-left 9x9 loadingsmatrix.
[5] fac= inv(lad[1..9,1..9])*x
The first 6 columns in FAC of the factorscores for f1..f9:
FAC :
fac1 |
0.417 |
-0.088 |
-1.065 |
0.574 |
2.137 |
0.093 |
... |
fac2 |
0.165 |
-1.721 |
-0.923 |
0.653 |
-0.575 |
-0.920 |
... |
fac3 |
0.789 |
0.864 |
-0.155 |
0.867 |
-0.063 |
0.827 |
... |
fac4 |
0.406 |
1.358 |
-0.577 |
-1.066 |
-0.122 |
1.445 |
... |
fac5 |
0.115 |
-0.652 |
-1.121 |
-0.262 |
-0.546 |
-1.188 |
... |
fac6 |
-0.226 |
1.342 |
-1.45 |
1.790 |
-0.193 |
-0.862 |
... |
fac7 |
-2.745 |
1.184 |
0.473 |
1.390 |
-0.004 |
-0.110 |
... |
fac8 |
1.109 |
1.558 |
1.330 |
0.753 |
0.495 |
0.017 |
... |
fac9 |
0.818 |
-1.543 |
-0.325 |
-0.374 |
0.417 |
0.974 |
... |
Now to be able to define 3 new components uncorrelated to the existing ones, we rotate the matrix of factor-scores to diagonal position.
[8] // get the transformation-matrix, which would
rotate
[9] // fac into triangular/diagonal position
[10] t = gettrans(fac,"drei")
[11] // do the rotation actually; fac1 has then a
diagonal in the top-left
[12] fac1 = fac*t
Since the 9 orthogonal components are uncorrelated, we get a diagonal matrix in the top-left 9x9-segment
FAC1:
fac1 1 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 2 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 3 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 4 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 5 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 6 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 7 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 8 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
... |
fac1 9 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
... |
To generate three more uncorrelated components means, to add three more rows and insert the value sqrt(n) = 10 in the appropriate diagonal position:
[16] fac1 = {fac1,null(3,n)} // first simply 3 null-rows
[17] fac1[10..12,10..12]=fac1[1..3,1..3] // insert diagonal-values(simply copy from
top-left
Fac1 :
fac1 1 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 2 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 3 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 4 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 5 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 6 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 7 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
0 |
... |
fac1 8 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
0 |
... |
fac1 9 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
0 |
... |
fac1 10 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
0 |
... |
fac1 11 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
0 |
... |
fac1 12 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
10 |
... |
Since we have the rotation-matrix (T) saved, which performed the rotation of FAC into this diagonal-position, we use its transpose, to rerotate FAC1 , which now includes three new and uncorrelated components:
[24] // rotate fac+newdata back to original position
(recreating fac)
[25] newfac = fac1 * t'
[26] // newfac 1..9 are identical to the original
values
[27] // the newdata newfac10..12 are still
uncorrelated to newfac1..9
NEWFAC :
newfac1 |
0.417 |
-0.088 |
-1.065 |
0.574 |
2.137 |
0.093 |
... |
newfac2 |
0.165 |
-1.721 |
-0.923 |
0.653 |
-0.575 |
-0.920 |
... |
newfac3 |
0.789 |
0.864 |
-0.155 |
0.867 |
-0.063 |
0.827 |
... |
newfac4 |
0.406 |
1.358 |
-0.577 |
-1.066 |
-0.122 |
1.445 |
... |
newfac5 |
0.115 |
-0.652 |
-1.121 |
-0.262 |
-0.546 |
-1.188 |
... |
newfac6 |
-0.226 |
1.342 |
-1.450 |
1.790 |
-0.193 |
-0.862 |
... |
newfac7 |
-2.745 |
1.184 |
0.473 |
1.390 |
-0.004 |
-0.110 |
... |
newfac8 |
1.109 |
1.558 |
1.330 |
0.753 |
0.495 |
0.017 |
... |
newfac9 |
0.818 |
-1.543 |
-0.325 |
-0.374 |
0.417 |
0.974 |
... |
newfac10 |
0.141 |
-0.097 |
-0.041 |
-0.106 |
-0.010 |
0.050 |
... |
newfac11 |
0.421 |
-0.022 |
0.408 |
0.084 |
0.118 |
0.087 |
... |
newfac12 |
0.284 |
-0.309 |
0.093 |
-0.367 |
0.117 |
0.177 |
... |
Note, that the data of newfac1..9 are identical to the original fac1..9.
Now we have 12 uncorrelated components and the 12x12 cholesky-loadingsmatrix, which defines the composition of variables according to their loadings on the components. We (re-)compute the dataset
[29] // create correlated data, based on the cholesky
matrix and new factors
[30] y = lad * newfac
Y disp :
y1 |
0.417 |
-0.088 |
-1.065 |
0.574 |
2.137 |
0.093 |
... |
y2 |
0.427 |
-1.157 |
-1.409 |
0.857 |
1.290 |
-0.510 |
... |
y3 |
0.826 |
0.54 |
-0.886 |
0.982 |
1.596 |
0.633 |
... |
y4 |
0.579 |
-0.228 |
-1.500 |
0.205 |
1.219 |
0.331 |
... |
y5 |
0.808 |
0.151 |
-1.647 |
0.409 |
1.230 |
0.197 |
... |
y6 |
0.432 |
0.381 |
-1.995 |
1.920 |
0.877 |
-0.547 |
... |
y7 |
0.054 |
0.052 |
-1.01 |
0.646 |
1.656 |
0.408 |
... |
y8 |
0.546 |
1.089 |
-1.335 |
1.599 |
1.694 |
0.154 |
... |
y9 |
0.611 |
-0.269 |
-1.813 |
0.752 |
1.238 |
0.607 |
... |
y10 |
0.703 |
-0.718 |
-1.134 |
0.738 |
1.443 |
-0.218 |
... |
y11 |
1.067 |
-0.683 |
-1.646 |
1.123 |
1.117 |
-0.523 |
... |
y12 |
0.023 |
-0.352 |
-1.531 |
0.971 |
1.141 |
0.064 |
... |
Check the correlations of the new dataset and the differences to the original correlations in R. The differences should all be zero
[32] chk = y *'/n - r
corr(Y) - R :
chkr1 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr2 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr3 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr4 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr5 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr6 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr7 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr8 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr9 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr10 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr11 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
chkr12 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
(MatMate
(beta) can be obtained for academic educational institutions without charge,
see http://go.helms-net.de/sw/matmate)
Gottfried Helms, 16.5.2007