Q: How to create simulated data
(to be added to a given dataset),
which jointly satisfy a given correlation-matrix.

Abstract: a problem solved using the matrix-program MatMate (© with the author). The listing of the output is documented to make the process transparent.

Gottfried Helms, 16.5.2007

 

 


;===========================================

;MatMate-Listing vom:16.05.2007 14:09:28

;============================================

 

First: show the data

X is empirical data in 9 variables, n=100 observations, observations of a variable run along a row

X :

x1

0.417

-0.088

-1.065

0.574

2.137

0.093

...

x2

0.427

-1.157

-1.409

0.857

1.290

-0.510

...

x3

0.826

0.540

-0.886

0.982

1.596

0.633

...

x4

0.579

-0.228

-1.500

0.205

1.219

0.331

...

x5

0.808

0.151

-1.647

0.409

1.230

0.197

...

x6

0.432

0.381

-1.995

1.920

0.877

-0.547

...

x7

0.054

0.052

-1.010

0.646

1.656

0.408

...

x8

0.546

1.089

-1.335

1.599

1.694

0.154

...

x9

0.611

-0.269

-1.813

0.752

1.238

0.607

...

 

 

R is correlation-matrix 12x12. The top-left 9x9-matrix are the empirical correlations for X, the remaining entries the required correlations for simulated data x10..x12.

R :

1.00

0.77

0.76

0.73

0.75

0.59

0.86

0.83

0.70

0.80

0.66

0.67

0.77

1.00

0.57

0.87

0.68

0.72

0.81

0.71

0.67

0.87

0.83

0.82

0.76

0.57

1.00

0.54

0.81

0.63

0.82

0.86

0.85

0.77

0.57

0.56

0.73

0.87

0.54

1.00

0.77

0.66

0.83

0.71

0.72

0.77

0.76

0.86

0.75

0.68

0.81

0.77

1.00

0.64

0.88

0.81

0.88

0.87

0.77

0.75

0.59

0.72

0.63

0.66

0.64

1.00

0.58

0.86

0.74

0.62

0.79

0.73

0.86

0.81

0.82

0.83

0.88

0.58

1.00

0.82

0.83

0.91

0.70

0.82

0.83

0.71

0.86

0.71

0.81

0.86

0.82

1.00

0.82

0.77

0.77

0.74

0.70

0.67

0.85

0.72

0.88

0.74

0.83

0.82

1.00

0.78

0.74

0.80

0.80

0.87

0.77

0.77

0.87

0.62

0.91

0.77

0.78

1.00

0.84

0.77

0.66

0.83

0.57

0.76

0.77

0.79

0.70

0.77

0.74

0.84

1.00

0.86

0.67

0.82

0.56

0.86

0.75

0.73

0.82

0.74

0.80

0.77

0.86

1.00

 


Second: Computations.

Compute cholesky-loadingsmatrix from R.

[3] lad = cholesky(r)

 

LAD is the cholesky-loadingsmatrix 12x12 computed from R. The top-left 9x9-matrix are the loadings of empirical data on the orthogonal components (whose scores will have to be computed later), the remaining entries the required loadings for simulated data x10..x12.

lad :

x1

1.00

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

x2

0.77

0.63

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

x3

0.76

-0.03

0.65

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

x4

0.73

0.48

-0.00

0.49

 .  

 .  

 .  

 .  

 .  

 .  

 .  

 .  

x5

0.75

0.15

0.37

0.31

0.42

 .  

 .  

 .  

 .  

 .  

 .  

 .  

x6

0.59

0.42

0.30

0.05

0.00

0.62

 .  

 .  

 .  

 .  

 .  

 .  

x7

0.86

0.22

0.26

0.21

0.09

-0.18

0.25

 .  

 .  

 .  

 .  

 .  

x8

0.83

0.10

0.35

0.13

0.00

0.35

0.10

0.17

 .  

 .  

 .  

 .  

x9

0.70

0.20

0.50

0.23

0.15

0.12

0.08

-0.19

0.30

 .  

 .  

 .  

x10

0.80

0.40

0.25

-0.00

0.27

-0.16

0.06

0.13

0.04

0.10

 .  

 .  

x11

0.66

0.51

0.12

0.07

0.31

0.23

-0.03

0.22

0.23

-0.00

0.19

 .  

x12

0.67

0.48

0.09

0.29

0.11

0.14

0.27

-0.04

0.20

-0.09

0.26

0.07

 


Here the factor-/component-scores for the 9 orthogonal components, which define the 9 empirical datarows x1..x9, are computed by inversion of the top-left 9x9 loadingsmatrix.

 

[5] fac= inv(lad[1..9,1..9])*x

 

The first 6 columns in FAC of the factorscores for f1..f9:

   FAC :

fac1

0.417

-0.088

-1.065

0.574

2.137

0.093

...

fac2

0.165

-1.721

-0.923

0.653

-0.575

-0.920

...

fac3

0.789

0.864

-0.155

0.867

-0.063

0.827

...

fac4

0.406

1.358

-0.577

-1.066

-0.122

1.445

...

fac5

0.115

-0.652

-1.121

-0.262

-0.546

-1.188

...

fac6

-0.226

1.342

-1.45

1.790

-0.193

-0.862

...

fac7

-2.745

1.184

0.473

1.390

-0.004

-0.110

...

fac8

1.109

1.558

1.330

0.753

0.495

0.017

...

fac9

0.818

-1.543

-0.325

-0.374

0.417

0.974

...

 


 

Now to be able to define 3 new components uncorrelated to the existing ones, we rotate the matrix of factor-scores to diagonal position.

[8] // get the transformation-matrix, which would rotate

[9] // fac into triangular/diagonal position

[10] t = gettrans(fac,"drei")

 

[11] // do the rotation actually; fac1 has then a diagonal in the top-left

[12] fac1 = fac*t

 

Since the 9 orthogonal components are uncorrelated, we get a diagonal matrix in the top-left 9x9-segment

FAC1:

fac1 1

10

0

0

0

0

0

0

0

0

0

0

0

...

fac1 2

0

10

0

0

0

0

0

0

0

0

0

0

...

fac1 3

0

0

10

0

0

0

0

0

0

0

0

0

...

fac1 4

0

0

0

10

0

0

0

0

0

0

0

0

...

fac1 5

0

0

0

0

10

0

0

0

0

0

0

0

...

fac1 6

0

0

0

0

0

10

0

0

0

0

0

0

...

fac1 7

0

0

0

0

0

0

10

0

0

0

0

0

...

fac1 8

0

0

0

0

0

0

0

10

0

0

0

0

...

fac1 9

0

0

0

0

0

0

0

0

10

0

0

0

...

 

 

To generate three more uncorrelated components means, to add three more rows and insert the value sqrt(n) = 10 in the appropriate diagonal position:

[16] fac1 = {fac1,null(3,n)}              // first simply 3 null-rows

[17] fac1[10..12,10..12]=fac1[1..3,1..3]  // insert diagonal-values(simply copy from top-left

 

Fac1 :

fac1 1

10

0

0

0

0

0

0

0

0

0

0

0

...

fac1 2

0

10

0

0

0

0

0

0

0

0

0

0

...

fac1 3

0

0

10

0

0

0

0

0

0

0

0

0

...

fac1 4

0

0

0

10

0

0

0

0

0

0

0

0

...

fac1 5

0

0

0

0

10

0

0

0

0

0

0

0

...

fac1 6

0

0

0

0

0

10

0

0

0

0

0

0

...

fac1 7

0

0

0

0

0

0

10

0

0

0

0

0

...

fac1 8

0

0

0

0

0

0

0

10

0

0

0

0

...

fac1 9

0

0

0

0

0

0

0

0

10

0

0

0

...

fac1 10

0

0

0

0

0

0

0

0

0

10

0

0

...

fac1 11

0

0

0

0

0

0

0

0

0

0

10

0

...

fac1 12

0

0

0

0

0

0

0

0

0

0

0

10

...

 


 

Since we have the rotation-matrix (T) saved, which performed the rotation of FAC into this diagonal-position, we use its transpose, to rerotate FAC1 , which now includes three new and uncorrelated components:

[24] // rotate fac+newdata back to original position (recreating fac)

[25] newfac = fac1 * t'

[26] // newfac 1..9 are identical to the original values

[27] // the newdata newfac10..12 are still uncorrelated to newfac1..9

 

NEWFAC :

newfac1

0.417

-0.088

-1.065

0.574

2.137

0.093

...

newfac2

0.165

-1.721

-0.923

0.653

-0.575

-0.920

...

newfac3

0.789

0.864

-0.155

0.867

-0.063

0.827

...

newfac4

0.406

1.358

-0.577

-1.066

-0.122

1.445

...

newfac5

0.115

-0.652

-1.121

-0.262

-0.546

-1.188

...

newfac6

-0.226

1.342

-1.450

1.790

-0.193

-0.862

...

newfac7

-2.745

1.184

0.473

1.390

-0.004

-0.110

...

newfac8

1.109

1.558

1.330

0.753

0.495

0.017

...

newfac9

0.818

-1.543

-0.325

-0.374

0.417

0.974

...

newfac10

0.141

-0.097

-0.041

-0.106

-0.010

0.050

...

newfac11

0.421

-0.022

0.408

0.084

0.118

0.087

...

newfac12

0.284

-0.309

0.093

-0.367

0.117

0.177

...

 

Note, that the data of newfac1..9 are identical to the original fac1..9.


 

Now we have 12 uncorrelated components and the 12x12 cholesky-loadingsmatrix, which defines the composition of variables according to their loadings on the components. We (re-)compute the dataset

[29] // create correlated data, based on the cholesky matrix and new factors

[30] y = lad * newfac

    

 

Y      disp :

y1

0.417

-0.088

-1.065

0.574

2.137

0.093

...

y2

0.427

-1.157

-1.409

0.857

1.290

-0.510

...

y3

0.826

0.54

-0.886

0.982

1.596

0.633

...

y4

0.579

-0.228

-1.500

0.205

1.219

0.331

...

y5

0.808

0.151

-1.647

0.409

1.230

0.197

...

y6

0.432

0.381

-1.995

1.920

0.877

-0.547

...

y7

0.054

0.052

-1.01

0.646

1.656

0.408

...

y8

0.546

1.089

-1.335

1.599

1.694

0.154

...

y9

0.611

-0.269

-1.813

0.752

1.238

0.607

...

y10

0.703

-0.718

-1.134

0.738

1.443

-0.218

...

y11

1.067

-0.683

-1.646

1.123

1.117

-0.523

...

y12

0.023

-0.352

-1.531

0.971

1.141

0.064

...

 


Last step:

Check the correlations of the new dataset and the differences to the original correlations in R. The differences should all be zero

[32] chk = y *'/n  - r  

 

corr(Y) - R :

chkr1

0

0

0

0

0

0

...

chkr2

0

0

0

0

0

0

...

chkr3

0

0

0

0

0

0

...

chkr4

0

0

0

0

0

0

...

chkr5

0

0

0

0

0

0

...

chkr6

0

0

0

0

0

0

...

chkr7

0

0

0

0

0

0

...

chkr8

0

0

0

0

0

0

...

chkr9

0

0

0

0

0

0

...

chkr10

0

0

0

0

0

0

...

chkr11

0

0

0

0

0

0

...

chkr12

0

0

0

0

0

0

...

 



 

(MatMate (beta) can be obtained for academic educational institutions without charge,
see http://go.helms-net.de/sw/matmate)

 

Gottfried Helms, 16.5.2007