Dear SPSS-L, I'm trying to generate data that meet the following specifications: 1. N = one million 2. Three log-normally distributed with skew=.80
2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15 I would have no difficulty generating k normally distributed variables and subjecting a Cholesky decomposition to obtain the desired bivariate correlations. But my desire to maintain log-normally distributed variables with a skew of .80 after the Cholesky decomposition has stumped me.
Any ideas? Ryan |
The simulation feature introduced in Statistics
21 can do this. In 21, you have to give simulation a model, but you
can just discard that. In V22 you can just request that data be generated.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Ryan Black <[hidden email]> To: [hidden email], Date: 10/29/2013 04:37 PM Subject: [SPSSX-L] Data Simulation Sent by: "SPSSX(r) Discussion" <[hidden email]> Dear SPSS-L, I'm trying to generate data that meet the following specifications: 1. N = one million 2. Three log-normally distributed with skew=.80 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15 I would have no difficulty generating k normally distributed variables and subjecting a Cholesky decomposition to obtain the desired bivariate correlations. But my desire to maintain log-normally distributed variables with a skew of .80 after the Cholesky decomposition has stumped me. Any ideas? Ryan |
Administrator
|
In reply to this post by Ryan
What if you generate the log normals.
Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and multiply the factor scores? That shouldn't (AFAIK) affect the shape of the distributions. Or maybe I am OTL on this? What code are you running? --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
A linear combination of log-normally distributed
random variables is not lognormally distributed, so the procedure proposed
below does not preserve the distribution property. The simulation
procedure I proposed does.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email], Date: 10/29/2013 11:28 PM Subject: Re: [SPSSX-L] Data Simulation Sent by: "SPSSX(r) Discussion" <[hidden email]> What if you generate the log normals. Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and multiply the factor scores? That shouldn't (AFAIK) affect the shape of the distributions. Or maybe I am OTL on this? What code are you running? -- Ryan Black wrote > Dear SPSS-L, > > I'm trying to generate data that meet the following specifications: > > 1. N = one million > 2. Three log-normally distributed with skew=.80 > 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15 > > I would have no difficulty generating k normally distributed variables and > subjecting a Cholesky decomposition to obtain the desired bivariate > correlations. But my desire to maintain log-normally distributed variables > with a skew of .80 after the Cholesky decomposition has stumped me. > > Any ideas? > > Ryan ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
AH, I must review my distribution theory ;-(
How exactly would one set up the simulation? I tried to hack my way through it and 1. First off the Simulation dialog insists on having data prior to getting to first base. Why? If one is creating data ex nihilo, then why complicate it? SO, I created X1, X2, X3 (1 case, Scale). Still the Simulation dialog pleaded for me to open a data file. OK, screw it: Build a QAD data array of 3 variables and assign Scale measurement level. /** <CODE> **/. MATRIX. SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3. END MATRIX. VARIABLE LEVEL X1 X2 X3 (SCALE). DESCRIPTIVES ALL. /** </CODE> **/. 2. When I did populate the dialog with my random variables (x1, X2, X3) and specified specific correlations along with the desired LogNormal distributions the Correlations did not remotely resemble the specified valued for the simulation. It took a bit of finagling to get the PASTE button to work. Ended up with the following code from PASTE (including the initial data constructor as well). So, I am obviously missing some important factoids. Setting LOCK=YES or NO has no effect on the results. Of course I have not done much RTFM in this regard just yet. Have other fires to put out today!. ------------------------------------------------------------- MATRIX. SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3. END MATRIX. VARIABLE LEVEL X1 X2 X3 (SCALE). DESCRIPTIVES ALL. DO REPEAT X=X1 X2 X3. COMPUTE X=RV.LNORMAL(1,.5). END REPEAT. DATASET NAME DataSet0. DATASET ACTIVATE DataSet0. *Create simulation plan. FILE HANDLE simplan_261249 /NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'. SIMPLAN CREATE /CONTINGENCY MULTIWAY=NO /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 ) /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 ) /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 ) /CORRELATIONS VARORDER=x1 x2 x3 CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0 LOCK=YES /AUTOFIT NCASES=ALL FIT=AD BINS=100 /STOPCRITERIA MAXCASES=100000 /MISSING CLASSMISSING=EXCLUDE /PLAN FILE=simplan_261249 DISPLAY=YES. *Run simulation plan. DATASET DECLARE DataSet5. SIMRUN /PLAN FILE=simplan_261249 /CRITERIA REPRESULTS=TRUE SEED=629111597 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO /OUTFILE FILE=DataSet5. ---------------------- Correlations x1 x2 x3 x1 1.000 .170 .173 x2 .170 1.000 .082 x3 .173 .082 1.000 Correlations between simulated inputs may differ from correlations specified for those inputs in the simulation plan.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks to David and Jon for responding. I have been trying to figure out how to use SIMULATION in v.21 and I am stuck. I went through the tutorial--still confused. :-( If I figure out how to solve my problem using SIMULATION, I will be certain to share the code with SPSS-L. Of course, if anyone has experience using SIMULATION, I would appreciate it if you would share code. Thanks, Ryan On Wed, Oct 30, 2013 at 10:02 AM, David Marso <[hidden email]> wrote: AH, I must review my distribution theory ;-( |
This is more straightforward in V22, but
here is what you would do in 21.
Open any dataset. It will be ignored in this usage, but since simulation is a procedure, SPSS architecture requires that there be a dataset open. It was too difficult to change the architecture for this one exception. Choose to type in the equation. Select new equation and enter something like z = x1+x2+x3. Add the x variables to the Defined Inputs list by using the New button. (Otherwise it would expect these to be in the input dataset) Go to the Simulation tab, Simulated Fields, and for each variable set the type to Lognormal and enter the desired parameter values. On the Correlations tab enter the desired correlations. On the Advanced Options set the number of cases and click the "Continue until maximum is reached" button. On the Save tab, specify an output dataset. Click Run or Paste Validate by activating that dataset and running CORRELATIONS. Run simulation again, specify the equation with x1,x2,x3, which now exist, and choose Fit All. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Ryan Black <[hidden email]> To: [hidden email], Date: 10/30/2013 08:23 AM Subject: Re: [SPSSX-L] Data Simulation Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks to David and Jon for responding. I have been trying to figure out how to use SIMULATION in v.21 and I am stuck. I went through the tutorial--still confused. :-( If I figure out how to solve my problem using SIMULATION, I will be certain to share the code with SPSS-L. Of course, if anyone has experience using SIMULATION, I would appreciate it if you would share code. Thanks, Ryan On Wed, Oct 30, 2013 at 10:02 AM, David Marso <david.marso@...> wrote: AH, I must review my distribution theory ;-( How exactly would one set up the simulation? I tried to hack my way through it and 1. First off the Simulation dialog insists on having data prior to getting to first base. Why? If one is creating data ex nihilo, then why complicate it? SO, I created X1, X2, X3 (1 case, Scale). Still the Simulation dialog pleaded for me to open a data file. OK, screw it: Build a QAD data array of 3 variables and assign Scale measurement level. /** <CODE> **/. MATRIX. SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3. END MATRIX. VARIABLE LEVEL X1 X2 X3 (SCALE). DESCRIPTIVES ALL. /** </CODE> **/. 2. When I did populate the dialog with my random variables (x1, X2, X3) and specified specific correlations along with the desired LogNormal distributions the Correlations did not remotely resemble the specified valued for the simulation. It took a bit of finagling to get the PASTE button to work. Ended up with the following code from PASTE (including the initial data constructor as well). So, I am obviously missing some important factoids. Setting LOCK=YES or NO has no effect on the results. Of course I have not done much RTFM in this regard just yet. Have other fires to put out today!. ------------------------------------------------------------- MATRIX. SAVE (UNIFORM(1000,3)) / OUTFILE * / VARIABLES X1 X2 X3. END MATRIX. VARIABLE LEVEL X1 X2 X3 (SCALE). DESCRIPTIVES ALL. DO REPEAT X=X1 X2 X3. COMPUTE X=RV.LNORMAL(1,.5). END REPEAT. DATASET NAME DataSet0. DATASET ACTIVATE DataSet0. *Create simulation plan. FILE HANDLE simplan_261249 /NAME='C:\Users\david2\Documents\SimulationPlan_2.splan'. SIMPLAN CREATE /CONTINGENCY MULTIWAY=NO /SIMINPUT INPUT=x1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 ) /SIMINPUT INPUT=x2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 ) /SIMINPUT INPUT=x3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=LNORMAL(A=1 B=.5 ) /CORRELATIONS VARORDER=x1 x2 x3 CORRMATRIX=1.0; 0.2, 1.0; 0.15, 0.1, 1.0 LOCK=YES /AUTOFIT NCASES=ALL FIT=AD BINS=100 /STOPCRITERIA MAXCASES=100000 /MISSING CLASSMISSING=EXCLUDE /PLAN FILE=simplan_261249 DISPLAY=YES. *Run simulation plan. DATASET DECLARE DataSet5. SIMRUN /PLAN FILE=simplan_261249 /CRITERIA REPRESULTS=TRUE SEED=629111597 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO /OUTFILE FILE=DataSet5. *---------------------- Correlations x1 x2 x3 x1 1.000 .170 .173 x2 .170 1.000 .082 x3 .173 .082 1.000 * Correlations between simulated inputs may differ from correlations specified for those inputs in the simulation plan. Jon K Peck wrote > A linear combination of log-normally distributed random variables is not > lognormally distributed, so the procedure proposed below does not preserve > the distribution property. The simulation procedure I proposed does. > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > phone: <a href="tel:720-342-5621">720-342-5621 > > > > > From: David Marso < > david.marso@ > > > To: > SPSSX-L@.uga > , > Date: 10/29/2013 11:28 PM > Subject: Re: [SPSSX-L] Data Simulation > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > > > > What if you generate the log normals. > Orthonormalize them via FACTOR, then run CHOL on the desired R matrix and > multiply the factor scores? > That shouldn't (AFAIK) affect the shape of the distributions. > Or maybe I am OTL on this? > What code are you running? > -- > > Ryan Black wrote >> Dear SPSS-L, >> >> I'm trying to generate data that meet the following specifications: >> >> 1. N = one million >> 2. Three log-normally distributed with skew=.80 >> 2. corr(var1,var2)=.30, corr(var1,var3)=.20, corr(var2,var3)=.15 >> >> I would have no difficulty generating k normally distributed variables > and >> subjecting a Cholesky decomposition to obtain the desired bivariate >> correlations. But my desire to maintain log-normally distributed > variables >> with a skew of .80 after the Cholesky decomposition has stumped me. >> >> Any ideas? >> >> Ryan > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos > ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in > abyssum?" > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722799.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Data-Simulation-tp5722793p5722806.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by David Marso
While I'm still wincing, I can't resist taking a stab at that garish, poorly designed UI!
Have the UI designers forgotten about that handy thing called a sub-dialog? Radial buttons which enable 1 of 3 things yet the disabled junk is hogging screen real estate? Talk about trying to do too much on a single tab? No wonder Ryan is confused by this beast. FURTHERMORE: Obviously *NO* consideration for those with crappy eyesight on small monitors or laptops (my laptop is 17.3"). I normally run at a high resolution (1920 x 1080 ). As an experiment I modified my screen resolution to 1024 x 768 and the main dialog drips off the bottom of the screen, rendering it completely useless (run, paste, reset, cancel and help are OFF the screen) ;-( I could go on a few more pages, but that would take a lot of my valuable time and after all, who am I to criticize the handiwork of professional UI designers (chuckle ;-) I can totally groove on the idea of everything but the kitchen sink functionality, but maybe break it up into digestible pieces. HARSH? Hmmm, I'm in a GOOD mood today rather than Mr. Kranky Kilt ;-) -------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Ryan, I am puzzled why SKEW is viewed as a key objective. SKEW is totally dependent on the variance. Do you mean Variance rather than Skew? ... Mark Miller
On Wed, Oct 30, 2013 at 9:43 AM, David Marso <[hidden email]> wrote: While I'm still wincing, I can't resist taking a stab at that garish, poorly |
Free forum by Nabble | Edit this page |