Data Simulation

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Simulation

Ryan
Dear SPSS-L,
 
I'm trying to generate data that meet the following specifications:
 
1. N = one million
2. Three log-normally distributed with skew=.80
2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
 
I would have no difficulty generating k normally distributed variables and subjecting a Cholesky decomposition to obtain the desired bivariate correlations. But my desire to maintain log-normally distributed variables with a skew of .80 after the Cholesky decomposition has stumped me.
 
Any ideas?
 
Ryan
Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

Jon K Peck
The simulation feature introduced in Statistics 21 can do this.  In 21, you have to give simulation a model, but you can just discard that.  In V22 you can just request that data be generated.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Ryan Black <[hidden email]>
To:        [hidden email],
Date:        10/29/2013 04:37 PM
Subject:        [SPSSX-L] Data Simulation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear SPSS-L,
 
I'm trying to generate data that meet the following specifications:
 
1. N = one million
2. Three log-normally distributed with skew=.80
2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
 
I would have no difficulty generating k normally distributed variables and subjecting a Cholesky decomposition to obtain the desired bivariate correlations. But my desire to maintain log-normally distributed variables with a skew of .80 after the Cholesky decomposition has stumped me.
 
Any ideas?
 
Ryan
Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

David Marso
Administrator
In reply to this post by Ryan
What if you generate the log normals.
Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and multiply the factor scores?
That shouldn't (AFAIK) affect the shape of the distributions.
Or maybe I am OTL on this?
What code are you running?
--
Ryan Black wrote
Dear SPSS-L,

I'm trying to generate data that meet the following specifications:

1. N = one million
2. Three log-normally distributed with skew=.80
2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15

I would have no difficulty generating k normally distributed variables and
subjecting a Cholesky decomposition to obtain the desired bivariate
correlations. But my desire to maintain log-normally distributed variables
with a skew of .80 after the Cholesky decomposition has stumped me.

Any ideas?

Ryan
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

Jon K Peck
A linear combination of log-normally distributed random variables is not lognormally distributed, so the procedure proposed below does not preserve the distribution property.  The simulation procedure I proposed does.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email],
Date:        10/29/2013 11:28 PM
Subject:        Re: [SPSSX-L] Data Simulation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




What if you generate the log normals.
Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and
multiply the factor scores?
That shouldn't (AFAIK) affect the shape of the distributions.
Or maybe I am OTL on this?
What code are you running?
--

Ryan Black wrote
> Dear SPSS-L,
>
> I'm trying to generate data that meet the following specifications:
>
> 1. N = one million
> 2. Three log-normally distributed with skew=.80
> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
>
> I would have no difficulty generating k normally distributed variables and
> subjecting a Cholesky decomposition to obtain the desired bivariate
> correlations. But my desire to maintain log-normally distributed variables
> with a skew of .80 after the Cholesky decomposition has stumped me.
>
> Any ideas?
>
> Ryan





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

David Marso
Administrator
AH, I must review my distribution theory ;-(
How exactly would one set up the simulation?
I tried to hack my way through it and
1.  First off the Simulation dialog insists on having data prior to getting to first base.
     Why?  If one is creating data ex nihilo, then why complicate it?
     SO, I created X1, X2, X3 (1 case, Scale).
     Still the Simulation dialog pleaded for me to open a data file.
     OK, screw it:  Build a QAD data array of 3 variables and assign Scale measurement level.

/** <CODE> **/.
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
/** </CODE> **/.


2.  When I did populate the dialog with my random variables (x1, X2, X3) and
     specified specific correlations along with the desired LogNormal distributions the
    Correlations did not remotely resemble the specified valued for the simulation.
     It took a bit of finagling to get the PASTE button to work.
     Ended up with the following code from PASTE (including the initial data constructor as well).

    So, I am obviously missing some important factoids.

    Setting LOCK=YES or NO has no effect on the results.
   Of course I have not done much RTFM in this regard just yet.
   Have other fires to put out today!.

-------------------------------------------------------------
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
DO REPEAT X=X1 X2 X3.
COMPUTE X=RV.LNORMAL(1,.5).
END REPEAT.

DATASET NAME DataSet0.
DATASET ACTIVATE DataSet0.

*Create simulation plan.
FILE HANDLE simplan_261249 /NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'.
SIMPLAN CREATE
 /CONTINGENCY MULTIWAY=NO
 /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /CORRELATIONS VARORDER=x1 x2 x3  CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0 LOCK=YES
 /AUTOFIT NCASES=ALL FIT=AD BINS=100
 /STOPCRITERIA MAXCASES=100000
 /MISSING CLASSMISSING=EXCLUDE
 /PLAN FILE=simplan_261249 DISPLAY=YES.

*Run simulation plan.
DATASET DECLARE DataSet5.
SIMRUN
 /PLAN FILE=simplan_261249
 /CRITERIA  REPRESULTS=TRUE  SEED=629111597
 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO
 /OUTFILE FILE=DataSet5.

----------------------
Correlations
        x1 x2 x3
x1 1.000 .170 .173
x2 .170 1.000 .082
x3 .173 .082 1.000


Correlations between simulated inputs may differ
from correlations specified for those inputs in the simulation plan.







Jon K Peck wrote
A linear combination of log-normally distributed random variables is not
lognormally distributed, so the procedure proposed below does not preserve
the distribution property.  The simulation procedure I proposed does.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   David Marso <[hidden email]>
To:     [hidden email],
Date:   10/29/2013 11:28 PM
Subject:        Re: [SPSSX-L] Data Simulation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



What if you generate the log normals.
Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and
multiply the factor scores?
That shouldn't (AFAIK) affect the shape of the distributions.
Or maybe I am OTL on this?
What code are you running?
--

Ryan Black wrote
> Dear SPSS-L,
>
> I'm trying to generate data that meet the following specifications:
>
> 1. N = one million
> 2. Three log-normally distributed with skew=.80
> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
>
> I would have no difficulty generating k normally distributed variables
and
> subjecting a Cholesky decomposition to obtain the desired bivariate
> correlations. But my desire to maintain log-normally distributed
variables
> with a skew of .80 after the Cholesky decomposition has stumped me.
>
> Any ideas?
>
> Ryan





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

Ryan
Thanks to David and Jon for responding.
 
I have been trying to figure out how to use SIMULATION in v.21 and I am stuck. I went through the tutorial--still confused. :-(
 
If I figure out how to solve my problem using SIMULATION, I will be certain to share the code with SPSS-L. Of course, if anyone has experience using SIMULATION, I would appreciate it if you would share code.
 
Thanks,
 
Ryan


On Wed, Oct 30, 2013 at 10:02 AM, David Marso <[hidden email]> wrote:
AH, I must review my distribution theory ;-(
How exactly would one set up the simulation?
I tried to hack my way through it and
1.  First off the Simulation dialog insists on having data prior to getting
to first base.
     Why?  If one is creating data ex nihilo, then why complicate it?
     SO, I created X1, X2, X3 (1 case, Scale).
     Still the Simulation dialog pleaded for me to open a data file.
     OK, screw it:  Build a QAD data array of 3 variables and assign Scale
measurement level.

/** <CODE> **/.
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
/** </CODE> **/.


2.  When I did populate the dialog with my random variables (x1, X2, X3) and
     specified specific correlations along with the desired LogNormal
distributions the
    Correlations did not remotely resemble the specified valued for the
simulation.
     It took a bit of finagling to get the PASTE button to work.
     Ended up with the following code from PASTE (including the initial data
constructor as well).

    So, I am obviously missing some important factoids.

    Setting LOCK=YES or NO has no effect on the results.
   Of course I have not done much RTFM in this regard just yet.
   Have other fires to put out today!.

-------------------------------------------------------------
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
DO REPEAT X=X1 X2 X3.
COMPUTE X=RV.LNORMAL(1,.5).
END REPEAT.

DATASET NAME DataSet0.
DATASET ACTIVATE DataSet0.

*Create simulation plan.
FILE HANDLE simplan_261249
/NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'.
SIMPLAN CREATE
 /CONTINGENCY MULTIWAY=NO
 /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /CORRELATIONS VARORDER=x1 x2 x3  CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0
LOCK=YES
 /AUTOFIT NCASES=ALL FIT=AD BINS=100
 /STOPCRITERIA MAXCASES=100000
 /MISSING CLASSMISSING=EXCLUDE
 /PLAN FILE=simplan_261249 DISPLAY=YES.

*Run simulation plan.
DATASET DECLARE DataSet5.
SIMRUN
 /PLAN FILE=simplan_261249
 /CRITERIA  REPRESULTS=TRUE  SEED=629111597
 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO
 /OUTFILE FILE=DataSet5.

*----------------------
Correlations
        x1      x2      x3
x1      1.000   .170    .173
x2      .170    1.000   .082
x3      .173    .082    1.000
*

Correlations between simulated inputs may differ
from correlations specified for those inputs in the simulation plan.








Jon K Peck wrote
> A linear combination of log-normally distributed random variables is not
> lognormally distributed, so the procedure proposed below does not preserve
> the distribution property.  The simulation procedure I proposed does.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM

> peck@.ibm

> phone: <a href="tel:720-342-5621" value="+17203425621">720-342-5621
>
>
>
>
> From:   David Marso &lt;

> david.marso@

> &gt;
> To:

> SPSSX-L@.uga

> ,
> Date:   10/29/2013 11:28 PM
> Subject:        Re: [SPSSX-L] Data Simulation
> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;
>
>
>
> What if you generate the log normals.
> Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and
> multiply the factor scores?
> That shouldn't (AFAIK) affect the shape of the distributions.
> Or maybe I am OTL on this?
> What code are you running?
> --
>
> Ryan Black wrote
>> Dear SPSS-L,
>>
>> I'm trying to generate data that meet the following specifications:
>>
>> 1. N = one million
>> 2. Three log-normally distributed with skew=.80
>> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
>>
>> I would have no difficulty generating k normally distributed variables
> and
>> subjecting a Cholesky decomposition to obtain the desired bivariate
>> correlations. But my desire to maintain log-normally distributed
> variables
>> with a skew of .80 after the Cholesky decomposition has stumped me.
>>
>> Any ideas?
>>
>> Ryan
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722806.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

Jon K Peck
This is more straightforward in V22, but here is what you would do in 21.

Open any dataset.  It will be ignored in this usage, but since simulation is a procedure, SPSS architecture requires that there be a dataset open.  It was too difficult to change the architecture for this one exception.

Choose to type in the equation.
Select new equation and enter something like z = x1+x2+x3.
Add the x variables to the Defined Inputs list by using the New button.  (Otherwise it would expect these to be in the input dataset)
Go to the Simulation tab, Simulated Fields, and for each variable set the type to Lognormal and enter the desired parameter values.
On the Correlations tab enter the desired correlations.
On the Advanced Options set the number of cases and click the "Continue until maximum is reached" button.
On the Save tab, specify an output dataset.
Click Run or Paste

Validate by activating that dataset and running CORRELATIONS.
Run simulation again, specify the equation with x1,x2,x3, which now exist, and choose Fit All.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Ryan Black <[hidden email]>
To:        [hidden email],
Date:        10/30/2013 08:23 AM
Subject:        Re: [SPSSX-L] Data Simulation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thanks to David and Jon for responding.
 
I have been trying to figure out how to use SIMULATION in v.21 and I am stuck. I went through the tutorial--still confused. :-(
 
If I figure out how to solve my problem using SIMULATION, I will be certain to share the code with SPSS-L. Of course, if anyone has experience using SIMULATION, I would appreciate it if you would share code.
 
Thanks,
 
Ryan


On Wed, Oct 30, 2013 at 10:02 AM, David Marso <david.marso@...> wrote:
AH, I must review my distribution theory ;-(
How exactly would one set up the simulation?
I tried to hack my way through it and
1.  First off the Simulation dialog insists on having data prior to getting
to first base.
     Why?  If one is creating data ex nihilo, then why complicate it?
     SO, I created X1, X2, X3 (1 case, Scale).
     Still the Simulation dialog pleaded for me to open a data file.
     OK, screw it:  Build a QAD data array of 3 variables and assign Scale
measurement level.

/** <CODE> **/.
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
/** </CODE> **/.


2.  When I did populate the dialog with my random variables (x1, X2, X3) and
     specified specific correlations along with the desired LogNormal
distributions the
    Correlations did not remotely resemble the specified valued for the
simulation.
     It took a bit of finagling to get the PASTE button to work.
     Ended up with the following code from PASTE (including the initial data
constructor as well).

    So, I am obviously missing some important factoids.

    Setting LOCK=YES or NO has no effect on the results.
   Of course I have not done much RTFM in this regard just yet.
   Have other fires to put out today!.

-------------------------------------------------------------
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
DO REPEAT X=X1 X2 X3.
COMPUTE X=RV.LNORMAL(1,.5).
END REPEAT.

DATASET NAME DataSet0.
DATASET ACTIVATE DataSet0.

*Create simulation plan.
FILE HANDLE simplan_261249
/NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'.
SIMPLAN CREATE
 /CONTINGENCY MULTIWAY=NO
 /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /CORRELATIONS VARORDER=x1 x2 x3  CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0
LOCK=YES
 /AUTOFIT NCASES=ALL FIT=AD BINS=100
 /STOPCRITERIA MAXCASES=100000
 /MISSING CLASSMISSING=EXCLUDE
 /PLAN FILE=simplan_261249 DISPLAY=YES.

*Run simulation plan.
DATASET DECLARE DataSet5.
SIMRUN
 /PLAN FILE=simplan_261249
 /CRITERIA  REPRESULTS=TRUE  SEED=629111597
 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO
 /OUTFILE FILE=DataSet5.

*----------------------
Correlations
        x1      x2      x3
x1      1.000   .170    .173
x2      .170    1.000   .082
x3      .173    .082    1.000
*

Correlations between simulated inputs may differ
from correlations specified for those inputs in the simulation plan.








Jon K Peck wrote

> A linear combination of log-normally distributed random variables is not
> lognormally distributed, so the procedure proposed below does not preserve
> the distribution property.  The simulation procedure I proposed does.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM

> peck@.ibm

> phone:
<a href="tel:720-342-5621">720-342-5621
>
>
>
>
> From:   David Marso &lt;

> david.marso@

> &gt;
> To:

> SPSSX-L@.uga

> ,

> Date:   10/29/2013 11:28 PM
> Subject:        Re: [SPSSX-L] Data Simulation

> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;

>
>
>
> What if you generate the log normals.
> Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and
> multiply the factor scores?
> That shouldn't (AFAIK) affect the shape of the distributions.
> Or maybe I am OTL on this?
> What code are you running?
> --
>
> Ryan Black wrote
>> Dear SPSS-L,
>>
>> I'm trying to generate data that meet the following specifications:
>>
>> 1. N = one million
>> 2. Three log-normally distributed with skew=.80
>> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
>>
>> I would have no difficulty generating k normally distributed variables
> and
>> subjecting a Cholesky decomposition to obtain the desired bivariate
>> correlations. But my desire to maintain log-normally distributed
> variables
>> with a skew of .80 after the Cholesky decomposition has stumped me.
>>
>> Any ideas?
>>
>> Ryan
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722806.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

David Marso
Administrator
In reply to this post by David Marso
While I'm still wincing, I can't resist taking a stab at that garish, poorly designed UI!
Have the UI designers forgotten about that handy thing called a sub-dialog?
Radial buttons which enable 1 of 3 things yet the disabled junk is hogging screen real estate?
Talk about trying to do too much on a single tab?  No wonder Ryan is confused by this beast.
FURTHERMORE:  Obviously *NO* consideration for those with crappy eyesight on small monitors or laptops (my laptop is 17.3").
I normally run at a high resolution (1920  x 1080 ).
As an experiment I modified my screen resolution to 1024 x 768 and the main dialog drips off the bottom of the  screen, rendering it completely useless (run, paste, reset, cancel and help are OFF the screen) ;-(
I could go on a few more pages, but that would take a lot of my valuable time and after all, who am I to criticize the handiwork of professional UI designers (chuckle ;-)
I can totally groove on the idea of everything but the kitchen sink functionality, but maybe break it up into digestible pieces.
HARSH?  Hmmm, I'm in a GOOD mood today rather than Mr. Kranky Kilt ;-)
-------

David Marso wrote
AH, I must review my distribution theory ;-(
How exactly would one set up the simulation?
I tried to hack my way through it and
1.  First off the Simulation dialog insists on having data prior to getting to first base.
     Why?  If one is creating data ex nihilo, then why complicate it?
     SO, I created X1, X2, X3 (1 case, Scale).
     Still the Simulation dialog pleaded for me to open a data file.
     OK, screw it:  Build a QAD data array of 3 variables and assign Scale measurement level.

/** <CODE> **/.
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
/** </CODE> **/.


2.  When I did populate the dialog with my random variables (x1, X2, X3) and
     specified specific correlations along with the desired LogNormal distributions the
    Correlations did not remotely resemble the specified valued for the simulation.
     It took a bit of finagling to get the PASTE button to work.
     Ended up with the following code from PASTE (including the initial data constructor as well).

    So, I am obviously missing some important factoids.

    Setting LOCK=YES or NO has no effect on the results.
   Of course I have not done much RTFM in this regard just yet.
   Have other fires to put out today!.

-------------------------------------------------------------
MATRIX.
SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
END MATRIX.
VARIABLE LEVEL X1 X2 X3 (SCALE).
DESCRIPTIVES ALL.
DO REPEAT X=X1 X2 X3.
COMPUTE X=RV.LNORMAL(1,.5).
END REPEAT.

DATASET NAME DataSet0.
DATASET ACTIVATE DataSet0.

*Create simulation plan.
FILE HANDLE simplan_261249 /NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'.
SIMPLAN CREATE
 /CONTINGENCY MULTIWAY=NO
 /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 )
 /CORRELATIONS VARORDER=x1 x2 x3  CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0 LOCK=YES
 /AUTOFIT NCASES=ALL FIT=AD BINS=100
 /STOPCRITERIA MAXCASES=100000
 /MISSING CLASSMISSING=EXCLUDE
 /PLAN FILE=simplan_261249 DISPLAY=YES.

*Run simulation plan.
DATASET DECLARE DataSet5.
SIMRUN
 /PLAN FILE=simplan_261249
 /CRITERIA  REPRESULTS=TRUE  SEED=629111597
 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO
 /OUTFILE FILE=DataSet5.

----------------------
Correlations
        x1 x2 x3
x1 1.000 .170 .173
x2 .170 1.000 .082
x3 .173 .082 1.000


Correlations between simulated inputs may differ
from correlations specified for those inputs in the simulation plan.







Jon K Peck wrote
A linear combination of log-normally distributed random variables is not
lognormally distributed, so the procedure proposed below does not preserve
the distribution property.  The simulation procedure I proposed does.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   David Marso <[hidden email]>
To:     [hidden email],
Date:   10/29/2013 11:28 PM
Subject:        Re: [SPSSX-L] Data Simulation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



What if you generate the log normals.
Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and
multiply the factor scores?
That shouldn't (AFAIK) affect the shape of the distributions.
Or maybe I am OTL on this?
What code are you running?
--

Ryan Black wrote
> Dear SPSS-L,
>
> I'm trying to generate data that meet the following specifications:
>
> 1. N = one million
> 2. Three log-normally distributed with skew=.80
> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
>
> I would have no difficulty generating k normally distributed variables
and
> subjecting a Cholesky decomposition to obtain the desired bivariate
> correlations. But my desire to maintain log-normally distributed
variables
> with a skew of .80 after the Cholesky decomposition has stumped me.
>
> Any ideas?
>
> Ryan





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data Simulation

Mark Miller
Ryan,

I am puzzled why SKEW is viewed as a key objective.
SKEW is totally dependent on the variance.
Do you mean Variance rather than Skew?

... Mark Miller




 




On Wed, Oct 30, 2013 at 9:43 AM, David Marso <[hidden email]> wrote:
While I'm still wincing, I can't resist taking a stab at that garish, poorly
designed UI!
Have the UI designers forgotten about that handy thing called a sub-dialog?
Radial buttons which enable 1 of 3 things yet the disabled junk is hogging
screen real estate?
Talk about trying to do too much on a single tab?  No wonder Ryan is
confused by this beast.
FURTHERMORE:  Obviously *NO* consideration for those with crappy eyesight on
small monitors or laptops (my laptop is 17.3").
I normally run at a high resolution (1920  x 1080 ).
As an experiment I modified my screen resolution to 1024 x 768 and the main
dialog drips off the bottom of the  screen, rendering it completely useless
(run, paste, reset, cancel and help are OFF the screen) ;-(
I could go on a few more pages, but that would take a lot of my valuable
time and after all, who am I to criticize the handiwork of professional UI
designers (chuckle ;-)
I can totally groove on the idea of everything but the kitchen sink
functionality, but maybe break it up into digestible pieces.
HARSH?  Hmmm, I'm in a GOOD mood today rather than Mr. Kranky Kilt ;-)
-------


David Marso wrote
> AH, I must review my distribution theory ;-(
> How exactly would one set up the simulation?
> I tried to hack my way through it and
> 1.  First off the Simulation dialog insists on having data prior to
> getting to first base.
>      Why?  If one is creating data ex nihilo, then why complicate it?
>      SO, I created X1, X2, X3 (1 case, Scale).
>      Still the Simulation dialog pleaded for me to open a data file.
>      OK, screw it:  Build a QAD data array of 3 variables and assign Scale
> measurement level.
>
> /**
> <CODE>
>  **/.
> MATRIX.
> SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
> END MATRIX.
> VARIABLE LEVEL X1 X2 X3 (SCALE).
> DESCRIPTIVES ALL.
> /**
> </CODE>
>  **/.
>
>
> 2.  When I did populate the dialog with my random variables (x1, X2, X3)
> and
>      specified specific correlations along with the desired LogNormal
> distributions the
>     Correlations did not remotely resemble the specified valued for the
> simulation.
>      It took a bit of finagling to get the PASTE button to work.
>      Ended up with the following code from PASTE (including the initial
> data constructor as well).
>
>     So, I am obviously missing some important factoids.
>
>     Setting LOCK=YES or NO has no effect on the results.
>    Of course I have not done much RTFM in this regard just yet.
>    Have other fires to put out today!.
>
> -------------------------------------------------------------
> MATRIX.
> SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3.
> END MATRIX.
> VARIABLE LEVEL X1 X2 X3 (SCALE).
> DESCRIPTIVES ALL.
> DO REPEAT X=X1 X2 X3.
> COMPUTE X=RV.LNORMAL(1,.5).
> END REPEAT.
>
> DATASET NAME DataSet0.
> DATASET ACTIVATE DataSet0.
>
> *Create simulation plan.
> FILE HANDLE simplan_261249
> /NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'.
> SIMPLAN CREATE
>  /CONTINGENCY MULTIWAY=NO
>  /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
> DISTRIBUTION=LNORMAL(A=1 B=.5 )
>  /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
> DISTRIBUTION=LNORMAL(A=1 B=.5 )
>  /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO)
> DISTRIBUTION=LNORMAL(A=1 B=.5 )
>  /CORRELATIONS VARORDER=x1 x2 x3  CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0
> LOCK=YES
>  /AUTOFIT NCASES=ALL FIT=AD BINS=100
>  /STOPCRITERIA MAXCASES=100000
>  /MISSING CLASSMISSING=EXCLUDE
>  /PLAN FILE=simplan_261249 DISPLAY=YES.
>
> *Run simulation plan.
> DATASET DECLARE DataSet5.
> SIMRUN
>  /PLAN FILE=simplan_261249
>  /CRITERIA  REPRESULTS=TRUE  SEED=629111597
>  /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO
>  /OUTFILE FILE=DataSet5.
*
> ----------------------
> Correlations
>       x1      x2      x3
> x1    1.000   .170    .173
> x2    .170    1.000   .082
> x3    .173    .082    1.000
*
>
> Correlations between simulated inputs may differ
> from correlations specified for those inputs in the simulation plan.
>
>
>
>
>
>
> Jon K Peck wrote
>> A linear combination of log-normally distributed random variables is not
>> lognormally distributed, so the procedure proposed below does not
>> preserve
>> the distribution property.  The simulation procedure I proposed does.
>>
>>
>> Jon Peck (no "h") aka Kim
>> Senior Software Engineer, IBM

>> peck@.ibm

>> phone: <a href="tel:720-342-5621" value="+17203425621">720-342-5621
>>
>>
>>
>>
>> From:   David Marso &lt;

>> david.marso@

>> &gt;
>> To:

>> SPSSX-L@.uga

>> ,
>> Date:   10/29/2013 11:28 PM
>> Subject:        Re: [SPSSX-L] Data Simulation
>> Sent by:        "SPSSX(r) Discussion" &lt;

>> SPSSX-L@.uga

>> &gt;
>>
>>
>>
>> What if you generate the log normals.
>> Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and
>> multiply the factor scores?
>> That shouldn't (AFAIK) affect the shape of the distributions.
>> Or maybe I am OTL on this?
>> What code are you running?
>> --
>>
>> Ryan Black wrote
>>> Dear SPSS-L,
>>>
>>> I'm trying to generate data that meet the following specifications:
>>>
>>> 1. N = one million
>>> 2. Three log-normally distributed with skew=.80
>>> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15
>>>
>>> I would have no difficulty generating k normally distributed variables
>> and
>>> subjecting a Cholesky decomposition to obtain the desired bivariate
>>> correlations. But my desire to maintain log-normally distributed
>> variables
>>> with a skew of .80 after the Cholesky decomposition has stumped me.
>>>
>>> Any ideas?
>>>
>>> Ryan
>>
>>
>>
>>
>>
>> -----
>> Please reply to the list and not to my personal email.
>> Those desiring my consulting or training services please feel free to
>> email me.
>> ---
>> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
>> porcos
>> ne forte conculcent eas pedibus suis."
>> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff
>> in
>> abyssum?"
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html
>>
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to

>> LISTSERV@.UGA

>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722809.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD