Manufactured dataset?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Manufactured dataset?

drmmmcmillian@gmail.com
Hi,
Is there a way to manufacture a small or medium-sized dataset in SPSS such that you can create variables with a particular correlation coefficient? If so, would you please share the syntax? This is for instructional purposes.


Best,
Monique 


--
Sent from Gmail Mobile
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
Hello Monique.  This has come up in the past, but finding the old threads is tricky if you don't use the right search terms.  Here's one of those old threads I was able to find.

http://spssx-discussion.1045642.n5.nabble.com/Data-Generation-for-predetermined-correlations-tp5718638.html

HTH.

drmmmcmillian@gmail.com wrote
Hi,
Is there a way to manufacture a small or medium-sized dataset in SPSS such
that you can create variables with a particular correlation coefficient? If
so, would you please share the syntax? This is for instructional purposes.


Best,
Monique


--
Sent from Gmail Mobile

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Jon Peck
Besides the MakeDatasetWithCases custom dialog that is referred to in Bruce's post, consider the simulation procedure:
Analyze > Simulation then Create Simulated Data then Create simulated data without a model.  On the next panel you specify the univariate properties of the variables and then click Correlations and enter the desired correlations.

The "without a model" option was added a few releases ago, but I don't recall exactly when.

On Thu, Jun 8, 2017 at 6:32 PM, Bruce Weaver <[hidden email]> wrote:
Hello Monique.  This has come up in the past, but finding the old threads is
tricky if you don't use the right search terms.  Here's one of those old
threads I was able to find.

http://spssx-discussion.1045642.n5.nabble.com/Data-Generation-for-predetermined-correlations-tp5718638.html

HTH.


[hidden email] wrote
> Hi,
> Is there a way to manufacture a small or medium-sized dataset in SPSS such
> that you can create variables with a particular correlation coefficient?
> If
> so, would you please share the syntax? This is for instructional purposes.
>
>
> Best,
> Monique
>
>
> --
> Sent from Gmail Mobile
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734382.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

drmmmcmillian@gmail.com
Thanks Bruce and Jon for these suggestions!
On Thu, Jun 8, 2017 at 9:25 PM Jon Peck <[hidden email]> wrote:
Besides the MakeDatasetWithCases custom dialog that is referred to in Bruce's post, consider the simulation procedure:
Analyze > Simulation then Create Simulated Data then Create simulated data without a model.  On the next panel you specify the univariate properties of the variables and then click Correlations and enter the desired correlations.

The "without a model" option was added a few releases ago, but I don't recall exactly when.

On Thu, Jun 8, 2017 at 6:32 PM, Bruce Weaver <[hidden email]> wrote:
Hello Monique.  This has come up in the past, but finding the old threads is
tricky if you don't use the right search terms.  Here's one of those old
threads I was able to find.



HTH.


[hidden email] wrote
> Hi,
> Is there a way to manufacture a small or medium-sized dataset in SPSS such
> that you can create variables with a particular correlation coefficient?
> If
> so, would you please share the syntax? This is for instructional purposes.
>
>
> Best,
> Monique
>
>
> --
> Sent from Gmail Mobile
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]


"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Sent from Gmail Mobile
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

David Marso
Administrator
In reply to this post by drmmmcmillian@gmail.com
Since this is for instructional purposes you might do better to use the code which gets under the hood.  Namely the subthread involving the Cholesky decomposition which is part of the thread Bruce linked  .  
IMNSHO: Using a prepackaged simulation black box renders little or no pedagogical value and hence voids your instructional intent.
YMMV.

drmmmcmillian@gmail.com wrote
Hi,
Is there a way to manufacture a small or medium-sized dataset in SPSS such
that you can create variables with a particular correlation coefficient? If
so, would you please share the syntax? This is for instructional purposes.


Best,
Monique


--
Sent from Gmail Mobile

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

David Marso
Administrator
*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal organs).
**<RANT> People should NOT publish code which requires major surgery!! </RANT>
**I wrote code like this code about 20 years ago and the theory is fairly trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.
END MATRIX.

DATASET ACTIVATE Sim.
** Check results (Optional) **.
CORRELATION !VLIST3 .
!ENDDEFINE.
SET MPRINT ON PRINTBACK ON.

* Self explanatory!*.
CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
            / N=200  P=3.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
Good morning David.  I assume that code runs error-free for you, does it?  When I run it (using SPSS 24 for Windoze, 64-bit), the following line is causing problems:

SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.

It generates this error message:

Run MATRIX procedure:
 
>Error # 34 in column 39.  Text: Sim
>SPSS Statistics cannot access a file with the given file specification.  The
>file specification is either syntactically invalid, specifies an invalid
>drive, specifies a protected directory, specifies a protected file, or
>specifies a non-sharable file.
>Execution of this command stops.
 
------ END MATRIX -----


It seems to me that we've run into this problem before:  The SAVE command in MATRIX does not work well (if at all) with declared datasets.  

If I edit the code to save to a file in a temporary folder on the hard disk, it all works just fine.  (I have to also alter some subsequent code to read the temporary file, of course.)  

Anyway, I'm curious about what you have done to make MATRIX SAVE work with a declared dataset, if indeed it is working for you!  

Bruce


David Marso wrote
*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal organs).
**<RANT> People should NOT publish code which requires major surgery!! </RANT>
**I wrote code like this code about 20 years ago and the theory is fairly trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.
END MATRIX.

DATASET ACTIVATE Sim.
** Check results (Optional) **.
CORRELATION !VLIST3 .
!ENDDEFINE.
SET MPRINT ON PRINTBACK ON.

* Self explanatory!*.
CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
            / N=200  P=3.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Anthony Babinec
Bruce, confirming that I get the same error. Running SPSS Statistics 24 on
Windows 10.

Anthony J. Babinec
[hidden email]
ASA Council of Chapters Chair,
Joint Statistical Meetings 2017 Program Committee




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Bruce Weaver
Sent: Friday, June 9, 2017 8:09 AM
To: [hidden email]
Subject: Re: Manufactured dataset?

Good morning David.  I assume that code runs error-free for you, does it?
When I run it (using SPSS 24 for Windoze, 64-bit), the following line is
causing problems:

SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
!VLIST3.

It generates this error message:

Run MATRIX procedure:
 
>Error # 34 in column 39.  Text: Sim
>SPSS Statistics cannot access a file with the given file specification.
The
>file specification is either syntactically invalid, specifies an
>invalid drive, specifies a protected directory, specifies a protected
>file, or specifies a non-sharable file.
>Execution of this command stops.
 
------ END MATRIX -----


It seems to me that we've run into this problem before:  The SAVE command in
MATRIX does not work well (if at all) with declared datasets.  

If I edit the code to save to a file in a temporary folder on the hard disk,
it all works just fine.  (I have to also alter some subsequent code to read
the temporary file, of course.)  

Anyway, I'm curious about what you have done to make MATRIX SAVE work with a
declared dataset, if indeed it is working for you!  

Bruce



o manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

David Marso
Administrator
In reply to this post by Bruce Weaver
Workaround for using DATASETS with MATRIX.
It was originally submitted by Progman in the middle 10th posting of this thread:


PRESERVE.
/* The following 2 lines are a work around for the Administrative user issue with MATRIX saving to datasets */.
FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' . 
CD fpFolder_With_ReadWriteRights . 

/* Do MATRIX STUFF*/.

RESTORE






On Fri, Jun 9, 2017 at 9:09 AM, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote:
Good morning David.  I assume that code runs error-free for you, does it?  When I run it (using SPSS 24 for Windoze, 64-bit), the following line is causing problems:

SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.

It generates this error message:

Run MATRIX procedure:
 
>Error # 34 in column 39.  Text: Sim
>SPSS Statistics cannot access a file with the given file specification.  The
>file specification is either syntactically invalid, specifies an invalid
>drive, specifies a protected directory, specifies a protected file, or
>specifies a non-sharable file.
>Execution of this command stops.
 
------ END MATRIX -----


It seems to me that we've run into this problem before:  The SAVE command in MATRIX does not work well (if at all) with declared datasets.  

If I edit the code to save to a file in a temporary folder on the hard disk, it all works just fine.  (I have to also alter some subsequent code to read the temporary file, of course.)  

Anyway, I'm curious about what you have done to make MATRIX SAVE work with a declared dataset, if indeed it is working for you!  

Bruce


David Marso wrote
*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal organs).
**<RANT> People should NOT publish code which requires major surgery!! </RANT>
**I wrote code like this code about 20 years ago and the theory is fairly trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.
END MATRIX.

DATASET ACTIVATE Sim.
** Check results (Optional) **.
CORRELATION !VLIST3 .
!ENDDEFINE.
SET MPRINT ON PRINTBACK ON.

* Self explanatory!*.
CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
            / N=200  P=3.
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734390.html
To unsubscribe from Manufactured dataset?, click here.
NAML

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

David Marso
Administrator
Teensy demo:

FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
* Save current default folder, and change to one with R/W permission *.
PRESERVE.
CD fpFolder_With_ReadWriteRights .

DATA LIST FREE / a b c.
BEGIN DATA
5 6 5
END DATA.

DATASET NAME abc.
DATASET DECLARE Tabc.
MATRIX.
GET DATA /FILE * .
PRINT data.
SAVE T(data)/OUTFILE Tabc.
END MATRIX.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
In reply to this post by David Marso
Right, I had forgotten about Progman's solution.  That fixes it.  Thanks David.

For anyone else trying it, add a command terminator to the RESTORE line.  ;-)  


David Marso wrote
Workaround for using DATASETS with MATRIX.
It was originally submitted by Progman in the middle 10th posting of this
thread:
http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-problem-with-Multiple-datasets-in-MATRIX-procedure-tc5722765.html#a5725723


PRESERVE.
/* The following 2 lines are a work around for the Administrative user
issue with MATRIX saving to datasets */.
FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
CD fpFolder_With_ReadWriteRights .

/* Do MATRIX STUFF*/.

RESTORE






On Fri, Jun 9, 2017 at 9:09 AM, Bruce Weaver [via SPSSX Discussion] <
[hidden email]> wrote:

> Good morning David.  I assume that code runs error-free for you, does it?
> When I run it (using SPSS 24 for Windoze, 64-bit), the following line is
> causing problems:
>
> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> !VLIST3.
>
> It generates this error message:
>
> Run MATRIX procedure:
>
> >Error # 34 in column 39.  Text: Sim
> >SPSS Statistics cannot access a file with the given file specification.
> The
> >file specification is either syntactically invalid, specifies an invalid
> >drive, specifies a protected directory, specifies a protected file, or
> >specifies a non-sharable file.
> >Execution of this command stops.
>
> ------ END MATRIX -----
>
>
> It seems to me that we've run into this problem before:  The SAVE command
> in MATRIX does not work well (if at all) with declared datasets.
>
> If I edit the code to save to a file in a temporary folder on the hard
> disk, it all works just fine.  (I have to also alter some subsequent code
> to read the temporary file, of course.)
>
> Anyway, I'm curious about what you have done to make MATRIX SAVE work with
> a declared dataset, if indeed it is working for you!
>
> Bruce
>
>
> David Marso wrote
> *****************************************************************.
> **Uses Cholesky decomposition (essentially a square root of a matrix) and
> **orthogonal vectors from FACTOR to generate desired simulated data.
> **Updates Howell (removes need for fiddling around with the internal
> organs).
> **<RANT> People should NOT publish code which requires major surgery!!
> </RANT>
> **I wrote code like this code about 20 years ago and the theory is fairly
> trivial.
> *****************************************************************.
> **Note: Does not require python or python extensions.
>
> DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).
>
> !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
> !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
> !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>
> * Simulate raw data of desired dimensionality *.
> INPUT PROGRAM.
> +  LOOP #=1 TO !N.
> +    DO REPEAT V=!VLIST1.
> +      COMPUTE V=RV.NORMAL(0,1).
> +    END REPEAT.
> +    END CASE.
> +  END LOOP.
> +  END FILE.
> END INPUT PROGRAM.
>
> * Orthogonalize Vectors * .
> **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
> **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
> FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>
> DATASET DECLARE Sim .
>
> * Generate Vectors with desired dimensionality *.
> MATRIX.
> GET RAW   / VAR !VLIST1.
> GET FS    / VAR !VLIST2.
> COMPUTE CORR={!CorrMat}.
> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> !VLIST3.
> END MATRIX.
>
> DATASET ACTIVATE Sim.
> ** Check results (Optional) **.
> CORRELATION !VLIST3 .
> !ENDDEFINE.
> SET MPRINT ON PRINTBACK ON.
>
> * Self explanatory!*.
> CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
>             / N=200  P=3.
>
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
>
>
> *NOTE: My Hotmail account is not monitored regularly. To send me an
> e-mail, please use the address shown above. *
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
> tp5734381p5734390.html
> To unsubscribe from Manufactured dataset?, click here
> <http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5734381&code=ZGF2aWQubWFyc29AZ21haWwuY29tfDU3MzQzODF8LTkzOTgzMDYxMw==>
> .
> NAML
> <http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
In reply to this post by David Marso
It's Friday afternoon, and I should have gone home already.  But it occurred to me that some users might like to specify a pattern of means and SDs for the simulated data (in addition to the pattern of correlations).  So I made a few tweaks to David's macro.  See below.

David, you'll notice that I replaced your X1 to XP variables with the re-scaled simulated variables.  I don't think they were required for anything at that point, so it seemed like a sensible thing to do.  Of course, it would be easy enough to generate another set of variable names instead.    


* ============================================================= .

* The original version of the macro in this file was written by David Marso.
* I tweaked it by adding MeanVect and SDvect macro arguments, which allow
* the user to specify the desired means and SDs for the simulated variables.
* I also expanded on some of the comments.

* References:
* http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734387.html
* http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734392.html
* http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-problem-with-Multiple-datasets-in-MATRIX-procedure-tp5722765p5725723.html
.
* As noted in the 3rd reference above, the MATRIX SAVE command
* can only save to a declared dataset if you are running SPSS
* as an Administrator.  Not all users are able to do that.  
* Therefore, this syntax includes a work-around, which is to
* temporarily change the working directory to the user's desktop.
* This is a folder for which they have read-write privileges,
* and that allows MATRIX SAVE to work properly.  The use of
* PRESERVE and RESTORE returns everything to the state it was
* in before the syntax file was executed.

PRESERVE.
* The following 2 commands are a work around for the
  Administrative user issue with MATRIX saving to datasets.
FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
CD fpFolder_With_ReadWriteRights .

NEW FILE.
DATASET CLOSE all.

*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal organs).
**<RANT> People should NOT publish code which requires major surgery!! </RANT>
**I wrote code like this code about 20 years ago and the theory is fairly trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim
 (CorrMat !CHAREND ("/") /
  MeanVect !CHAREND ("/") /
  SDvect !CHAREND ("/") /
  N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.
END MATRIX.

* The variables X1 to XP are no longer needed, so replace them
* with the re-scaled simulated variables.
DATASET ACTIVATE Sim.
DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
- COMPUTE x = S*StDev+M.
END REPEAT.
* Check that it worked as intended (OPTIONAL) .
DESCRIPTIVES !VLIST3 !VLIST1.
* Correlations for variables with Mean=0, SD=1.
CORRELATIONS !VLIST3 /MISSING=LISTWISE.
* Correlations for re-scaled variables.
CORRELATIONS !VLIST1 /MISSING=LISTWISE.
!ENDDEFINE.

* Comment out the SET command below if you like
* when it is no longer needed for debugging.
SET MPRINT ON PRINTBACK ON.

* When calling the macro,
*  CorrMat = the desired correlation matrix, with commas separating
*            elements and semicolons separating rows.
*  MeanVect = vector of desired means for simulated variables
*  SDvect = vector of desired SDs for simulated variables.
*  N = the number of cases in the simulated dataset;
*  P = the number of variables in the simulated dataset.
* MeanVect and SDvect will be used in a DO-REPEAT structure,
* so just list the values with no commas.

CorrSim
 CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
 MeanVect 45 50 55 /
 SDvect 11 10 9 /
 N=200  P=3.

SET MPRINT OFF.
RESTORE.  /* Restore the current working directory.

* ============================================================= .
 

David Marso wrote
*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal organs).
**<RANT> People should NOT publish code which requires major surgery!! </RANT>
**I wrote code like this code about 20 years ago and the theory is fairly trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.
END MATRIX.

DATASET ACTIVATE Sim.
** Check results (Optional) **.
CORRELATION !VLIST3 .
!ENDDEFINE.
SET MPRINT ON PRINTBACK ON.

* Self explanatory!*.
CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
            / N=200  P=3.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Jon Peck
The simulation approach I mentioned earlier is a lot less trouble and covers all these options (plus choice of probability distribution).

On Fri, Jun 9, 2017 at 3:28 PM, Bruce Weaver <[hidden email]> wrote:
It's Friday afternoon, and I should have gone home already.  But it occurred
to me that some users might like to specify a pattern of means and SDs for
the simulated data (in addition to the pattern of correlations).  So I made
a few tweaks to David's macro.  See below.

David, you'll notice that I replaced your X1 to XP variables with the
re-scaled simulated variables.  I don't think they were required for
anything at that point, so it seemed like a sensible thing to do.  Of
course, it would be easy enough to generate another set of variable names
instead.


* ============================================================= .

* The original version of the macro in this file was written by David Marso.
* I tweaked it by adding MeanVect and SDvect macro arguments, which allow
* the user to specify the desired means and SDs for the simulated variables.
* I also expanded on some of the comments.

* References:
*
http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734387.html
*
http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734392.html
*
http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-problem-with-Multiple-datasets-in-MATRIX-procedure-tp5722765p5725723.html
.
* As noted in the 3rd reference above, the MATRIX SAVE command
* can only save to a declared dataset if you are running SPSS
* as an Administrator.  Not all users are able to do that.
* Therefore, this syntax includes a work-around, which is to
* temporarily change the working directory to the user's desktop.
* This is a folder for which they have read-write privileges,
* and that allows MATRIX SAVE to work properly.  The use of
* PRESERVE and RESTORE returns everything to the state it was
* in before the syntax file was executed.

PRESERVE.
* The following 2 commands are a work around for the
  Administrative user issue with MATRIX saving to datasets.
FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
CD fpFolder_With_ReadWriteRights .

NEW FILE.
DATASET CLOSE all.

*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal
organs).
**<RANT> People should NOT publish code which requires major surgery!!
</RANT>
**I wrote code like this code about 20 years ago and the theory is fairly
trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim
 (CorrMat !CHAREND ("/") /
  MeanVect !CHAREND ("/") /
  SDvect !CHAREND ("/") /
  N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
!VLIST3.
END MATRIX.

* The variables X1 to XP are no longer needed, so replace them
* with the re-scaled simulated variables.
DATASET ACTIVATE Sim.
DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
- COMPUTE x = S*StDev+M.
END REPEAT.
* Check that it worked as intended (OPTIONAL) .
DESCRIPTIVES !VLIST3 !VLIST1.
* Correlations for variables with Mean=0, SD=1.
CORRELATIONS !VLIST3 /MISSING=LISTWISE.
* Correlations for re-scaled variables.
CORRELATIONS !VLIST1 /MISSING=LISTWISE.
!ENDDEFINE.

* Comment out the SET command below if you like
* when it is no longer needed for debugging.
SET MPRINT ON PRINTBACK ON.

* When calling the macro,
*  CorrMat = the desired correlation matrix, with commas separating
*            elements and semicolons separating rows.
*  MeanVect = vector of desired means for simulated variables
*  SDvect = vector of desired SDs for simulated variables.
*  N = the number of cases in the simulated dataset;
*  P = the number of variables in the simulated dataset.
* MeanVect and SDvect will be used in a DO-REPEAT structure,
* so just list the values with no commas.

CorrSim
 CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
 MeanVect 45 50 55 /
 SDvect 11 10 9 /
 N=200  P=3.

SET MPRINT OFF.
RESTORE.  /* Restore the current working directory.

* ============================================================= .



David Marso wrote
> *****************************************************************.
> **Uses Cholesky decomposition (essentially a square root of a matrix) and
> **orthogonal vectors from FACTOR to generate desired simulated data.
> **Updates Howell (removes need for fiddling around with the internal
> organs).
> **
> <RANT>
>  People should NOT publish code which requires major surgery!!
> </RANT>
> **I wrote code like this code about 20 years ago and the theory is fairly
> trivial.
> *****************************************************************.
> **Note: Does not require python or python extensions.
>
> DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).
>
> !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
> !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
> !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>
> * Simulate raw data of desired dimensionality *.
> INPUT PROGRAM.
> +  LOOP #=1 TO !N.
> +    DO REPEAT V=!VLIST1.
> +      COMPUTE V=RV.NORMAL(0,1).
> +    END REPEAT.
> +    END CASE.
> +  END LOOP.
> +  END FILE.
> END INPUT PROGRAM.
>
> * Orthogonalize Vectors * .
> **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
> **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
> FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>
> DATASET DECLARE Sim .
>
> * Generate Vectors with desired dimensionality *.
> MATRIX.
> GET RAW   / VAR !VLIST1.
> GET FS    / VAR !VLIST2.
> COMPUTE CORR={!CorrMat}.
> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> !VLIST3.
> END MATRIX.
>
> DATASET ACTIVATE Sim.
> ** Check results (Optional) **.
> CORRELATION !VLIST3 .
> !ENDDEFINE.
> SET MPRINT ON PRINTBACK ON.
>
> * Self explanatory!*.
> CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
>             / N=200  P=3.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734403.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
Hi Jon.  I think it would be helpful if you showed us an example of the Analyze > Simulation approach.  (I'm assuming the GUI generates syntax.)  I just took a quick look at the GUI dialogs, and got the impression it would take me more than 5 or 10 minutes to figure it out.  Therefore, I'm questioning whether a novice user would really find that approach any easier than running David's macro definition and then calling the macro.  (Both of those steps are pretty straightforward, I think.)  

Also, as David mentioned earlier in the thread, the OP in this thread was doing this as an educational exercise.  So being able to inspect the inner workings (if one wants to) is useful.  I suppose to achieve the same thing with the simulation approach, one would have to look up the algorithms.  

Cheers,
Bruce


Jon Peck wrote
The simulation approach I mentioned earlier is a lot less trouble and
covers all these options (plus choice of probability distribution).

On Fri, Jun 9, 2017 at 3:28 PM, Bruce Weaver <[hidden email]>
wrote:

> It's Friday afternoon, and I should have gone home already.  But it
> occurred
> to me that some users might like to specify a pattern of means and SDs for
> the simulated data (in addition to the pattern of correlations).  So I made
> a few tweaks to David's macro.  See below.
>
> David, you'll notice that I replaced your X1 to XP variables with the
> re-scaled simulated variables.  I don't think they were required for
> anything at that point, so it seemed like a sensible thing to do.  Of
> course, it would be easy enough to generate another set of variable names
> instead.
>
>
> * ============================================================= .
>
> * The original version of the macro in this file was written by David
> Marso.
> * I tweaked it by adding MeanVect and SDvect macro arguments, which allow
> * the user to specify the desired means and SDs for the simulated
> variables.
> * I also expanded on some of the comments.
>
> * References:
> *
> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
> tp5734381p5734387.html
> *
> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
> tp5734381p5734392.html
> *
> http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-
> problem-with-Multiple-datasets-in-MATRIX-procedure-tp5722765p5725723.html
> .
> * As noted in the 3rd reference above, the MATRIX SAVE command
> * can only save to a declared dataset if you are running SPSS
> * as an Administrator.  Not all users are able to do that.
> * Therefore, this syntax includes a work-around, which is to
> * temporarily change the working directory to the user's desktop.
> * This is a folder for which they have read-write privileges,
> * and that allows MATRIX SAVE to work properly.  The use of
> * PRESERVE and RESTORE returns everything to the state it was
> * in before the syntax file was executed.
>
> PRESERVE.
> * The following 2 commands are a work around for the
>   Administrative user issue with MATRIX saving to datasets.
> FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
> CD fpFolder_With_ReadWriteRights .
>
> NEW FILE.
> DATASET CLOSE all.
>
> *****************************************************************.
> **Uses Cholesky decomposition (essentially a square root of a matrix) and
> **orthogonal vectors from FACTOR to generate desired simulated data.
> **Updates Howell (removes need for fiddling around with the internal
> organs).
> **<RANT> People should NOT publish code which requires major surgery!!
> </RANT>
> **I wrote code like this code about 20 years ago and the theory is fairly
> trivial.
> *****************************************************************.
> **Note: Does not require python or python extensions.
>
> DEFINE CorrSim
>  (CorrMat !CHAREND ("/") /
>   MeanVect !CHAREND ("/") /
>   SDvect !CHAREND ("/") /
>   N !TOKENS(1) / P !TOKENS(1)).
>
> !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
> !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
> !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>
> * Simulate raw data of desired dimensionality *.
> INPUT PROGRAM.
> +  LOOP #=1 TO !N.
> +    DO REPEAT V=!VLIST1.
> +      COMPUTE V=RV.NORMAL(0,1).
> +    END REPEAT.
> +    END CASE.
> +  END LOOP.
> +  END FILE.
> END INPUT PROGRAM.
>
> * Orthogonalize Vectors * .
> **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
> **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
> FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>
> DATASET DECLARE Sim .
>
> * Generate Vectors with desired dimensionality *.
> MATRIX.
> GET RAW   / VAR !VLIST1.
> GET FS    / VAR !VLIST2.
> COMPUTE CORR={!CorrMat}.
> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> !VLIST3.
> END MATRIX.
>
> * The variables X1 to XP are no longer needed, so replace them
> * with the re-scaled simulated variables.
> DATASET ACTIVATE Sim.
> DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
> - COMPUTE x = S*StDev+M.
> END REPEAT.
> * Check that it worked as intended (OPTIONAL) .
> DESCRIPTIVES !VLIST3 !VLIST1.
> * Correlations for variables with Mean=0, SD=1.
> CORRELATIONS !VLIST3 /MISSING=LISTWISE.
> * Correlations for re-scaled variables.
> CORRELATIONS !VLIST1 /MISSING=LISTWISE.
> !ENDDEFINE.
>
> * Comment out the SET command below if you like
> * when it is no longer needed for debugging.
> SET MPRINT ON PRINTBACK ON.
>
> * When calling the macro,
> *  CorrMat = the desired correlation matrix, with commas separating
> *            elements and semicolons separating rows.
> *  MeanVect = vector of desired means for simulated variables
> *  SDvect = vector of desired SDs for simulated variables.
> *  N = the number of cases in the simulated dataset;
> *  P = the number of variables in the simulated dataset.
> * MeanVect and SDvect will be used in a DO-REPEAT structure,
> * so just list the values with no commas.
>
> CorrSim
>  CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
>  MeanVect 45 50 55 /
>  SDvect 11 10 9 /
>  N=200  P=3.
>
> SET MPRINT OFF.
> RESTORE.  /* Restore the current working directory.
>
> * ============================================================= .
>
>
>
> David Marso wrote
> > *****************************************************************.
> > **Uses Cholesky decomposition (essentially a square root of a matrix) and
> > **orthogonal vectors from FACTOR to generate desired simulated data.
> > **Updates Howell (removes need for fiddling around with the internal
> > organs).
> > **
> > <RANT>
> >  People should NOT publish code which requires major surgery!!
> > </RANT>
> > **I wrote code like this code about 20 years ago and the theory is fairly
> > trivial.
> > *****************************************************************.
> > **Note: Does not require python or python extensions.
> >
> > DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).
> >
> > !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
> > !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
> > !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
> >
> > * Simulate raw data of desired dimensionality *.
> > INPUT PROGRAM.
> > +  LOOP #=1 TO !N.
> > +    DO REPEAT V=!VLIST1.
> > +      COMPUTE V=RV.NORMAL(0,1).
> > +    END REPEAT.
> > +    END CASE.
> > +  END LOOP.
> > +  END FILE.
> > END INPUT PROGRAM.
> >
> > * Orthogonalize Vectors * .
> > **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
> > **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
> (!P,FS).
> > FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
> >
> > DATASET DECLARE Sim .
> >
> > * Generate Vectors with desired dimensionality *.
> > MATRIX.
> > GET RAW   / VAR !VLIST1.
> > GET FS    / VAR !VLIST2.
> > COMPUTE CORR={!CorrMat}.
> > SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> > !VLIST3.
> > END MATRIX.
> >
> > DATASET ACTIVATE Sim.
> > ** Check results (Optional) **.
> > CORRELATION !VLIST3 .
> > !ENDDEFINE.
> > SET MPRINT ON PRINTBACK ON.
> >
> > * Self explanatory!*.
> > CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
> >             / N=200  P=3.
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.
> 1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734403.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Jon K Peck
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Jon Peck
I would not use the syntax.  That is very complicated, because the simulation engine is pretty elaborate with lots of options.  But the dialog (actually a wizard) is pretty easy, taking it step by step.

Analyze > Simulation
Click "Create Simulated Data".  Continue.
Click Create simulated data without a model. Click New and specify a name for each variable - say x, y, z
Click the Simulation tab.
Specify a distribution and its parameters for each variable.  There are many choices, but you get a picture for each choice both generic and the shape for the parameter  values you specify.
Click Correlations.  You see a grid where you can specify correlations.
Click Save and specify an output data file or dataset name.  

Click Run.

This can be pretty educational when you want to explore probability distributions and what correlated data look like.

As an educational exercise, my guess is that the education is really about the distributions and generated data rather than things like a Cholesky factorization, but I haven't seen that stated explicitly.


On Fri, Jun 9, 2017 at 3:57 PM, Bruce Weaver <[hidden email]> wrote:
Hi Jon.  I think it would be helpful if you showed us an example of the
Analyze > Simulation approach.  (I'm assuming the GUI generates syntax.)  I
just took a quick look at the GUI dialogs, and got the impression it would
take me more than 5 or 10 minutes to figure it out.  Therefore, I'm
questioning whether a novice user would really find that approach any easier
than running David's macro definition and then calling the macro.  (Both of
those steps are pretty straightforward, I think.)

Also, as David mentioned earlier in the thread, the OP in this thread was
doing this as an educational exercise.  So being able to inspect the inner
workings (if one wants to) is useful.  I suppose to achieve the same thing
with the simulation approach, one would have to look up the algorithms.

Cheers,
Bruce



Jon Peck wrote
> The simulation approach I mentioned earlier is a lot less trouble and
> covers all these options (plus choice of probability distribution).
>
> On Fri, Jun 9, 2017 at 3:28 PM, Bruce Weaver <

> bruce.weaver@

> >
> wrote:
>
>> It's Friday afternoon, and I should have gone home already.  But it
>> occurred
>> to me that some users might like to specify a pattern of means and SDs
>> for
>> the simulated data (in addition to the pattern of correlations).  So I
>> made
>> a few tweaks to David's macro.  See below.
>>
>> David, you'll notice that I replaced your X1 to XP variables with the
>> re-scaled simulated variables.  I don't think they were required for
>> anything at that point, so it seemed like a sensible thing to do.  Of
>> course, it would be easy enough to generate another set of variable names
>> instead.
>>
>>
>> * ============================================================= .
>>
>> * The original version of the macro in this file was written by David
>> Marso.
>> * I tweaked it by adding MeanVect and SDvect macro arguments, which allow
>> * the user to specify the desired means and SDs for the simulated
>> variables.
>> * I also expanded on some of the comments.
>>
>> * References:
>> *
>> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
>> tp5734381p5734387.html
>> *
>> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
>> tp5734381p5734392.html
>> *
>> http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-
>> problem-with-Multiple-datasets-in-MATRIX-procedure-tp5722765p5725723.html
>> .
>> * As noted in the 3rd reference above, the MATRIX SAVE command
>> * can only save to a declared dataset if you are running SPSS
>> * as an Administrator.  Not all users are able to do that.
>> * Therefore, this syntax includes a work-around, which is to
>> * temporarily change the working directory to the user's desktop.
>> * This is a folder for which they have read-write privileges,
>> * and that allows MATRIX SAVE to work properly.  The use of
>> * PRESERVE and RESTORE returns everything to the state it was
>> * in before the syntax file was executed.
>>
>> PRESERVE.
>> * The following 2 commands are a work around for the
>>   Administrative user issue with MATRIX saving to datasets.
>> FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
>> CD fpFolder_With_ReadWriteRights .
>>
>> NEW FILE.
>> DATASET CLOSE all.
>>
>> *****************************************************************.
>> **Uses Cholesky decomposition (essentially a square root of a matrix) and
>> **orthogonal vectors from FACTOR to generate desired simulated data.
>> **Updates Howell (removes need for fiddling around with the internal
>> organs).
>> **
> <RANT>
>  People should NOT publish code which requires major surgery!!
>>
> </RANT>
>> **I wrote code like this code about 20 years ago and the theory is fairly
>> trivial.
>> *****************************************************************.
>> **Note: Does not require python or python extensions.
>>
>> DEFINE CorrSim
>>  (CorrMat !CHAREND ("/") /
>>   MeanVect !CHAREND ("/") /
>>   SDvect !CHAREND ("/") /
>>   N !TOKENS(1) / P !TOKENS(1)).
>>
>> !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
>> !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
>> !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>>
>> * Simulate raw data of desired dimensionality *.
>> INPUT PROGRAM.
>> +  LOOP #=1 TO !N.
>> +    DO REPEAT V=!VLIST1.
>> +      COMPUTE V=RV.NORMAL(0,1).
>> +    END REPEAT.
>> +    END CASE.
>> +  END LOOP.
>> +  END FILE.
>> END INPUT PROGRAM.
>>
>> * Orthogonalize Vectors * .
>> **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
>> **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
>> (!P,FS).
>> FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>>
>> DATASET DECLARE Sim .
>>
>> * Generate Vectors with desired dimensionality *.
>> MATRIX.
>> GET RAW   / VAR !VLIST1.
>> GET FS    / VAR !VLIST2.
>> COMPUTE CORR={!CorrMat}.
>> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
>> !VLIST3.
>> END MATRIX.
>>
>> * The variables X1 to XP are no longer needed, so replace them
>> * with the re-scaled simulated variables.
>> DATASET ACTIVATE Sim.
>> DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
>> - COMPUTE x = S*StDev+M.
>> END REPEAT.
>> * Check that it worked as intended (OPTIONAL) .
>> DESCRIPTIVES !VLIST3 !VLIST1.
>> * Correlations for variables with Mean=0, SD=1.
>> CORRELATIONS !VLIST3 /MISSING=LISTWISE.
>> * Correlations for re-scaled variables.
>> CORRELATIONS !VLIST1 /MISSING=LISTWISE.
>> !ENDDEFINE.
>>
>> * Comment out the SET command below if you like
>> * when it is no longer needed for debugging.
>> SET MPRINT ON PRINTBACK ON.
>>
>> * When calling the macro,
>> *  CorrMat = the desired correlation matrix, with commas separating
>> *            elements and semicolons separating rows.
>> *  MeanVect = vector of desired means for simulated variables
>> *  SDvect = vector of desired SDs for simulated variables.
>> *  N = the number of cases in the simulated dataset;
>> *  P = the number of variables in the simulated dataset.
>> * MeanVect and SDvect will be used in a DO-REPEAT structure,
>> * so just list the values with no commas.
>>
>> CorrSim
>>  CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
>>  MeanVect 45 50 55 /
>>  SDvect 11 10 9 /
>>  N=200  P=3.
>>
>> SET MPRINT OFF.
>> RESTORE.  /* Restore the current working directory.
>>
>> * ============================================================= .
>>
>>
>>
>> David Marso wrote
>> > *****************************************************************.
>> > **Uses Cholesky decomposition (essentially a square root of a matrix)
>> and
>> > **orthogonal vectors from FACTOR to generate desired simulated data.
>> > **Updates Howell (removes need for fiddling around with the internal
>> > organs).
>> > **
>> >
> <RANT>
>> >  People should NOT publish code which requires major surgery!!
>> >
> </RANT>
>> > **I wrote code like this code about 20 years ago and the theory is
>> fairly
>> > trivial.
>> > *****************************************************************.
>> > **Note: Does not require python or python extensions.
>> >
>> > DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).
>> >
>> > !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
>> > !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
>> > !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>> >
>> > * Simulate raw data of desired dimensionality *.
>> > INPUT PROGRAM.
>> > +  LOOP #=1 TO !N.
>> > +    DO REPEAT V=!VLIST1.
>> > +      COMPUTE V=RV.NORMAL(0,1).
>> > +    END REPEAT.
>> > +    END CASE.
>> > +  END LOOP.
>> > +  END FILE.
>> > END INPUT PROGRAM.
>> >
>> > * Orthogonalize Vectors * .
>> > **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
>> > **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
>> (!P,FS).
>> > FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>> >
>> > DATASET DECLARE Sim .
>> >
>> > * Generate Vectors with desired dimensionality *.
>> > MATRIX.
>> > GET RAW   / VAR !VLIST1.
>> > GET FS    / VAR !VLIST2.
>> > COMPUTE CORR={!CorrMat}.
>> > SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
>> > !VLIST3.
>> > END MATRIX.
>> >
>> > DATASET ACTIVATE Sim.
>> > ** Check results (Optional) **.
>> > CORRELATION !VLIST3 .
>> > !ENDDEFINE.
>> > SET MPRINT ON PRINTBACK ON.
>> >
>> > * Self explanatory!*.
>> > CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
>> >             / N=200  P=3.
>>
>>
>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>>

> bweaver@

>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> View this message in context: http://spssx-discussion.
>> 1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734403.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734405.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
Thanks Jon.  I had taken another look at the GUI dialog before I had to go home yesterday (and before seeing your post below), and realized that I was not seeing a live PASTE button because I had not visited the Simulation tab.  I did get far enough to grasp that SIMPREP, SIMPLAN and SIMRUN are the relevant command names.

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simplan.htm

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simrun.htm

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simprep.htm

I have no SPSS at home.  If I have time, I might have a go at it next week.

Regarding types of distributions, the RV.NORMAL(0,1) used in David's macro could easily be replaced with any other RV function one might want to use--and that could be passed as another macro argument.

Bruce

Jon Peck wrote
I would not use the syntax.  That is very complicated, because the
simulation engine is pretty elaborate with lots of options.  But the dialog
(actually a wizard) is pretty easy, taking it step by step.

Analyze > Simulation
Click "Create Simulated Data".  Continue.
Click Create simulated data without a model. Click New and specify a name
for each variable - say x, y, z
Click the Simulation tab.
Specify a distribution and its parameters for each variable.  There are
many choices, but you get a picture for each choice both generic and the
shape for the parameter  values you specify.
Click Correlations.  You see a grid where you can specify correlations.
Click Save and specify an output data file or dataset name.

Click Run.

This can be pretty educational when you want to explore probability
distributions and what correlated data look like.

As an educational exercise, my guess is that the education is really about
the distributions and generated data rather than things like a Cholesky
factorization, but I haven't seen that stated explicitly.


On Fri, Jun 9, 2017 at 3:57 PM, Bruce Weaver <[hidden email]>
wrote:

> Hi Jon.  I think it would be helpful if you showed us an example of the
> Analyze > Simulation approach.  (I'm assuming the GUI generates syntax.)  I
> just took a quick look at the GUI dialogs, and got the impression it would
> take me more than 5 or 10 minutes to figure it out.  Therefore, I'm
> questioning whether a novice user would really find that approach any
> easier
> than running David's macro definition and then calling the macro.  (Both of
> those steps are pretty straightforward, I think.)
>
> Also, as David mentioned earlier in the thread, the OP in this thread was
> doing this as an educational exercise.  So being able to inspect the inner
> workings (if one wants to) is useful.  I suppose to achieve the same thing
> with the simulation approach, one would have to look up the algorithms.
>
> Cheers,
> Bruce
>
>
>
> Jon Peck wrote
> > The simulation approach I mentioned earlier is a lot less trouble and
> > covers all these options (plus choice of probability distribution).
> >
> > On Fri, Jun 9, 2017 at 3:28 PM, Bruce Weaver <
>
> > bruce.weaver@
>
> > >
> > wrote:
> >
> >> It's Friday afternoon, and I should have gone home already.  But it
> >> occurred
> >> to me that some users might like to specify a pattern of means and SDs
> >> for
> >> the simulated data (in addition to the pattern of correlations).  So I
> >> made
> >> a few tweaks to David's macro.  See below.
> >>
> >> David, you'll notice that I replaced your X1 to XP variables with the
> >> re-scaled simulated variables.  I don't think they were required for
> >> anything at that point, so it seemed like a sensible thing to do.  Of
> >> course, it would be easy enough to generate another set of variable
> names
> >> instead.
> >>
> >>
> >> * ============================================================= .
> >>
> >> * The original version of the macro in this file was written by David
> >> Marso.
> >> * I tweaked it by adding MeanVect and SDvect macro arguments, which
> allow
> >> * the user to specify the desired means and SDs for the simulated
> >> variables.
> >> * I also expanded on some of the comments.
> >>
> >> * References:
> >> *
> >> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
> >> tp5734381p5734387.html
> >> *
> >> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
> >> tp5734381p5734392.html
> >> *
> >> http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-
> >> problem-with-Multiple-datasets-in-MATRIX-procedure-
> tp5722765p5725723.html
> >> .
> >> * As noted in the 3rd reference above, the MATRIX SAVE command
> >> * can only save to a declared dataset if you are running SPSS
> >> * as an Administrator.  Not all users are able to do that.
> >> * Therefore, this syntax includes a work-around, which is to
> >> * temporarily change the working directory to the user's desktop.
> >> * This is a folder for which they have read-write privileges,
> >> * and that allows MATRIX SAVE to work properly.  The use of
> >> * PRESERVE and RESTORE returns everything to the state it was
> >> * in before the syntax file was executed.
> >>
> >> PRESERVE.
> >> * The following 2 commands are a work around for the
> >>   Administrative user issue with MATRIX saving to datasets.
> >> FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop'
> .
> >> CD fpFolder_With_ReadWriteRights .
> >>
> >> NEW FILE.
> >> DATASET CLOSE all.
> >>
> >> *****************************************************************.
> >> **Uses Cholesky decomposition (essentially a square root of a matrix)
> and
> >> **orthogonal vectors from FACTOR to generate desired simulated data.
> >> **Updates Howell (removes need for fiddling around with the internal
> >> organs).
> >> **
> > <RANT>
> >  People should NOT publish code which requires major surgery!!
> >>
> > </RANT>
> >> **I wrote code like this code about 20 years ago and the theory is
> fairly
> >> trivial.
> >> *****************************************************************.
> >> **Note: Does not require python or python extensions.
> >>
> >> DEFINE CorrSim
> >>  (CorrMat !CHAREND ("/") /
> >>   MeanVect !CHAREND ("/") /
> >>   SDvect !CHAREND ("/") /
> >>   N !TOKENS(1) / P !TOKENS(1)).
> >>
> >> !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
> >> !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
> >> !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
> >>
> >> * Simulate raw data of desired dimensionality *.
> >> INPUT PROGRAM.
> >> +  LOOP #=1 TO !N.
> >> +    DO REPEAT V=!VLIST1.
> >> +      COMPUTE V=RV.NORMAL(0,1).
> >> +    END REPEAT.
> >> +    END CASE.
> >> +  END LOOP.
> >> +  END FILE.
> >> END INPUT PROGRAM.
> >>
> >> * Orthogonalize Vectors * .
> >> **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
> >> **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
> >> (!P,FS).
> >> FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
> >>
> >> DATASET DECLARE Sim .
> >>
> >> * Generate Vectors with desired dimensionality *.
> >> MATRIX.
> >> GET RAW   / VAR !VLIST1.
> >> GET FS    / VAR !VLIST2.
> >> COMPUTE CORR={!CorrMat}.
> >> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> >> !VLIST3.
> >> END MATRIX.
> >>
> >> * The variables X1 to XP are no longer needed, so replace them
> >> * with the re-scaled simulated variables.
> >> DATASET ACTIVATE Sim.
> >> DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
> >> - COMPUTE x = S*StDev+M.
> >> END REPEAT.
> >> * Check that it worked as intended (OPTIONAL) .
> >> DESCRIPTIVES !VLIST3 !VLIST1.
> >> * Correlations for variables with Mean=0, SD=1.
> >> CORRELATIONS !VLIST3 /MISSING=LISTWISE.
> >> * Correlations for re-scaled variables.
> >> CORRELATIONS !VLIST1 /MISSING=LISTWISE.
> >> !ENDDEFINE.
> >>
> >> * Comment out the SET command below if you like
> >> * when it is no longer needed for debugging.
> >> SET MPRINT ON PRINTBACK ON.
> >>
> >> * When calling the macro,
> >> *  CorrMat = the desired correlation matrix, with commas separating
> >> *            elements and semicolons separating rows.
> >> *  MeanVect = vector of desired means for simulated variables
> >> *  SDvect = vector of desired SDs for simulated variables.
> >> *  N = the number of cases in the simulated dataset;
> >> *  P = the number of variables in the simulated dataset.
> >> * MeanVect and SDvect will be used in a DO-REPEAT structure,
> >> * so just list the values with no commas.
> >>
> >> CorrSim
> >>  CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
> >>  MeanVect 45 50 55 /
> >>  SDvect 11 10 9 /
> >>  N=200  P=3.
> >>
> >> SET MPRINT OFF.
> >> RESTORE.  /* Restore the current working directory.
> >>
> >> * ============================================================= .
> >>
> >>
> >>
> >> David Marso wrote
> >> > *****************************************************************.
> >> > **Uses Cholesky decomposition (essentially a square root of a matrix)
> >> and
> >> > **orthogonal vectors from FACTOR to generate desired simulated data.
> >> > **Updates Howell (removes need for fiddling around with the internal
> >> > organs).
> >> > **
> >> >
> > <RANT>
> >> >  People should NOT publish code which requires major surgery!!
> >> >
> > </RANT>
> >> > **I wrote code like this code about 20 years ago and the theory is
> >> fairly
> >> > trivial.
> >> > *****************************************************************.
> >> > **Note: Does not require python or python extensions.
> >> >
> >> > DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P !TOKENS(1)).
> >> >
> >> > !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
> >> > !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
> >> > !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
> >> >
> >> > * Simulate raw data of desired dimensionality *.
> >> > INPUT PROGRAM.
> >> > +  LOOP #=1 TO !N.
> >> > +    DO REPEAT V=!VLIST1.
> >> > +      COMPUTE V=RV.NORMAL(0,1).
> >> > +    END REPEAT.
> >> > +    END CASE.
> >> > +  END LOOP.
> >> > +  END FILE.
> >> > END INPUT PROGRAM.
> >> >
> >> > * Orthogonalize Vectors * .
> >> > **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
> >> > **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
> >> (!P,FS).
> >> > FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
> >> >
> >> > DATASET DECLARE Sim .
> >> >
> >> > * Generate Vectors with desired dimensionality *.
> >> > MATRIX.
> >> > GET RAW   / VAR !VLIST1.
> >> > GET FS    / VAR !VLIST2.
> >> > COMPUTE CORR={!CorrMat}.
> >> > SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
> >> > !VLIST3.
> >> > END MATRIX.
> >> >
> >> > DATASET ACTIVATE Sim.
> >> > ** Check results (Optional) **.
> >> > CORRELATION !VLIST3 .
> >> > !ENDDEFINE.
> >> > SET MPRINT ON PRINTBACK ON.
> >> >
> >> > * Self explanatory!*.
> >> > CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,  1.00
> >> >             / N=200  P=3.
> >>
> >>
> >>
> >>
> >>
> >> -----
> >> --
> >> Bruce Weaver
> >>
>
> > bweaver@
>
> >> http://sites.google.com/a/lakeheadu.ca/bweaver/
> >>
> >> "When all else fails, RTFM."
> >>
> >> NOTE: My Hotmail account is not monitored regularly.
> >> To send me an e-mail, please use the address shown above.
> >>
> >> --
> >> View this message in context: http://spssx-discussion.
> >> 1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734403.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >>
>
> > LISTSERV@.UGA
>
> >  (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >>
> >
> >
> >
> > --
> > Jon K Peck
>
> > jkpeck@
>
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
>
> > LISTSERV@.UGA
>
> >  (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.
> 1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734405.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Jon K Peck
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Jon Peck
All true, although simulation includes at least one rn generator not available as a transformation function - triangular.  This was added because it is quite popular in the simulation world.  You can, however, get it using SPSSINC TRANS and the extendedTransforms.py module included with Statistics. :-)

On Sat, Jun 10, 2017 at 6:19 AM, Bruce Weaver <[hidden email]> wrote:
Thanks Jon.  I had taken another look at the GUI dialog before I had to go
home yesterday (and before seeing your post below), and realized that I was
not seeing a live PASTE button because I had not visited the Simulation tab.
I did get far enough to grasp that SIMPREP, SIMPLAN and SIMRUN are the
relevant command names.

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simplan.htm

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simrun.htm

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simprep.htm

I have no SPSS at home.  If I have time, I might have a go at it next week.

Regarding types of distributions, the RV.NORMAL(0,1) used in David's macro
could easily be replaced with any other RV function one might want to
use--and that could be passed as another macro argument.

Bruce


Jon Peck wrote
> I would not use the syntax.  That is very complicated, because the
> simulation engine is pretty elaborate with lots of options.  But the
> dialog
> (actually a wizard) is pretty easy, taking it step by step.
>
> Analyze > Simulation
> Click "Create Simulated Data".  Continue.
> Click Create simulated data without a model. Click New and specify a name
> for each variable - say x, y, z
> Click the Simulation tab.
> Specify a distribution and its parameters for each variable.  There are
> many choices, but you get a picture for each choice both generic and the
> shape for the parameter  values you specify.
> Click Correlations.  You see a grid where you can specify correlations.
> Click Save and specify an output data file or dataset name.
>
> Click Run.
>
> This can be pretty educational when you want to explore probability
> distributions and what correlated data look like.
>
> As an educational exercise, my guess is that the education is really about
> the distributions and generated data rather than things like a Cholesky
> factorization, but I haven't seen that stated explicitly.
>
>
> On Fri, Jun 9, 2017 at 3:57 PM, Bruce Weaver <

> bruce.weaver@

> >
> wrote:
>
>> Hi Jon.  I think it would be helpful if you showed us an example of the
>> Analyze > Simulation approach.  (I'm assuming the GUI generates syntax.)
>> I
>> just took a quick look at the GUI dialogs, and got the impression it
>> would
>> take me more than 5 or 10 minutes to figure it out.  Therefore, I'm
>> questioning whether a novice user would really find that approach any
>> easier
>> than running David's macro definition and then calling the macro.  (Both
>> of
>> those steps are pretty straightforward, I think.)
>>
>> Also, as David mentioned earlier in the thread, the OP in this thread was
>> doing this as an educational exercise.  So being able to inspect the
>> inner
>> workings (if one wants to) is useful.  I suppose to achieve the same
>> thing
>> with the simulation approach, one would have to look up the algorithms.
>>
>> Cheers,
>> Bruce
>>
>>
>>
>> Jon Peck wrote
>> > The simulation approach I mentioned earlier is a lot less trouble and
>> > covers all these options (plus choice of probability distribution).
>> >
>> > On Fri, Jun 9, 2017 at 3:28 PM, Bruce Weaver <
>>
>> > bruce.weaver@
>>
>> > >
>> > wrote:
>> >
>> >> It's Friday afternoon, and I should have gone home already.  But it
>> >> occurred
>> >> to me that some users might like to specify a pattern of means and SDs
>> >> for
>> >> the simulated data (in addition to the pattern of correlations).  So I
>> >> made
>> >> a few tweaks to David's macro.  See below.
>> >>
>> >> David, you'll notice that I replaced your X1 to XP variables with the
>> >> re-scaled simulated variables.  I don't think they were required for
>> >> anything at that point, so it seemed like a sensible thing to do.  Of
>> >> course, it would be easy enough to generate another set of variable
>> names
>> >> instead.
>> >>
>> >>
>> >> * ============================================================= .
>> >>
>> >> * The original version of the macro in this file was written by David
>> >> Marso.
>> >> * I tweaked it by adding MeanVect and SDvect macro arguments, which
>> allow
>> >> * the user to specify the desired means and SDs for the simulated
>> >> variables.
>> >> * I also expanded on some of the comments.
>> >>
>> >> * References:
>> >> *
>> >> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
>> >> tp5734381p5734387.html
>> >> *
>> >> http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-
>> >> tp5734381p5734392.html
>> >> *
>> >> http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-
>> >> problem-with-Multiple-datasets-in-MATRIX-procedure-
>> tp5722765p5725723.html
>> >> .
>> >> * As noted in the 3rd reference above, the MATRIX SAVE command
>> >> * can only save to a declared dataset if you are running SPSS
>> >> * as an Administrator.  Not all users are able to do that.
>> >> * Therefore, this syntax includes a work-around, which is to
>> >> * temporarily change the working directory to the user's desktop.
>> >> * This is a folder for which they have read-write privileges,
>> >> * and that allows MATRIX SAVE to work properly.  The use of
>> >> * PRESERVE and RESTORE returns everything to the state it was
>> >> * in before the syntax file was executed.
>> >>
>> >> PRESERVE.
>> >> * The following 2 commands are a work around for the
>> >>   Administrative user issue with MATRIX saving to datasets.
>> >> FILE HANDLE fpFolder_With_ReadWriteRights
>> /NAME='%userprofile%\Desktop'
>> .
>> >> CD fpFolder_With_ReadWriteRights .
>> >>
>> >> NEW FILE.
>> >> DATASET CLOSE all.
>> >>
>> >> *****************************************************************.
>> >> **Uses Cholesky decomposition (essentially a square root of a matrix)
>> and
>> >> **orthogonal vectors from FACTOR to generate desired simulated data.
>> >> **Updates Howell (removes need for fiddling around with the internal
>> >> organs).
>> >> **
>> >
> <RANT>
>> >  People should NOT publish code which requires major surgery!!
>> >>
>> >
> </RANT>
>> >> **I wrote code like this code about 20 years ago and the theory is
>> fairly
>> >> trivial.
>> >> *****************************************************************.
>> >> **Note: Does not require python or python extensions.
>> >>
>> >> DEFINE CorrSim
>> >>  (CorrMat !CHAREND ("/") /
>> >>   MeanVect !CHAREND ("/") /
>> >>   SDvect !CHAREND ("/") /
>> >>   N !TOKENS(1) / P !TOKENS(1)).
>> >>
>> >> !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
>> >> !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
>> >> !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>> >>
>> >> * Simulate raw data of desired dimensionality *.
>> >> INPUT PROGRAM.
>> >> +  LOOP #=1 TO !N.
>> >> +    DO REPEAT V=!VLIST1.
>> >> +      COMPUTE V=RV.NORMAL(0,1).
>> >> +    END REPEAT.
>> >> +    END CASE.
>> >> +  END LOOP.
>> >> +  END FILE.
>> >> END INPUT PROGRAM.
>> >>
>> >> * Orthogonalize Vectors * .
>> >> **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
>> >> **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
>> >> (!P,FS).
>> >> FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>> >>
>> >> DATASET DECLARE Sim .
>> >>
>> >> * Generate Vectors with desired dimensionality *.
>> >> MATRIX.
>> >> GET RAW   / VAR !VLIST1.
>> >> GET FS    / VAR !VLIST2.
>> >> COMPUTE CORR={!CorrMat}.
>> >> SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
>> >> !VLIST3.
>> >> END MATRIX.
>> >>
>> >> * The variables X1 to XP are no longer needed, so replace them
>> >> * with the re-scaled simulated variables.
>> >> DATASET ACTIVATE Sim.
>> >> DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
>> >> - COMPUTE x = S*StDev+M.
>> >> END REPEAT.
>> >> * Check that it worked as intended (OPTIONAL) .
>> >> DESCRIPTIVES !VLIST3 !VLIST1.
>> >> * Correlations for variables with Mean=0, SD=1.
>> >> CORRELATIONS !VLIST3 /MISSING=LISTWISE.
>> >> * Correlations for re-scaled variables.
>> >> CORRELATIONS !VLIST1 /MISSING=LISTWISE.
>> >> !ENDDEFINE.
>> >>
>> >> * Comment out the SET command below if you like
>> >> * when it is no longer needed for debugging.
>> >> SET MPRINT ON PRINTBACK ON.
>> >>
>> >> * When calling the macro,
>> >> *  CorrMat = the desired correlation matrix, with commas separating
>> >> *            elements and semicolons separating rows.
>> >> *  MeanVect = vector of desired means for simulated variables
>> >> *  SDvect = vector of desired SDs for simulated variables.
>> >> *  N = the number of cases in the simulated dataset;
>> >> *  P = the number of variables in the simulated dataset.
>> >> * MeanVect and SDvect will be used in a DO-REPEAT structure,
>> >> * so just list the values with no commas.
>> >>
>> >> CorrSim
>> >>  CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
>> >>  MeanVect 45 50 55 /
>> >>  SDvect 11 10 9 /
>> >>  N=200  P=3.
>> >>
>> >> SET MPRINT OFF.
>> >> RESTORE.  /* Restore the current working directory.
>> >>
>> >> * ============================================================= .
>> >>
>> >>
>> >>
>> >> David Marso wrote
>> >> > *****************************************************************.
>> >> > **Uses Cholesky decomposition (essentially a square root of a
>> matrix)
>> >> and
>> >> > **orthogonal vectors from FACTOR to generate desired simulated data.
>> >> > **Updates Howell (removes need for fiddling around with the internal
>> >> > organs).
>> >> > **
>> >> >
>> >
> <RANT>
>> >> >  People should NOT publish code which requires major surgery!!
>> >> >
>> >
> </RANT>
>> >> > **I wrote code like this code about 20 years ago and the theory is
>> >> fairly
>> >> > trivial.
>> >> > *****************************************************************.
>> >> > **Note: Does not require python or python extensions.
>> >> >
>> >> > DEFINE CorrSim (CorrMat !CHAREND ("/") / N !TOKENS(1) / P
>> !TOKENS(1)).
>> >> >
>> >> > !LET !VLIST1 = !CONCAT('X1 TO X',!P ).
>> >> > !LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
>> >> > !LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).
>> >> >
>> >> > * Simulate raw data of desired dimensionality *.
>> >> > INPUT PROGRAM.
>> >> > +  LOOP #=1 TO !N.
>> >> > +    DO REPEAT V=!VLIST1.
>> >> > +      COMPUTE V=RV.NORMAL(0,1).
>> >> > +    END REPEAT.
>> >> > +    END CASE.
>> >> > +  END LOOP.
>> >> > +  END FILE.
>> >> > END INPUT PROGRAM.
>> >> >
>> >> > * Orthogonalize Vectors * .
>> >> > **OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
>> >> > **OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG
>> >> (!P,FS).
>> >> > FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).
>> >> >
>> >> > DATASET DECLARE Sim .
>> >> >
>> >> > * Generate Vectors with desired dimensionality *.
>> >> > MATRIX.
>> >> > GET RAW   / VAR !VLIST1.
>> >> > GET FS    / VAR !VLIST2.
>> >> > COMPUTE CORR={!CorrMat}.
>> >> > SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2
>> >> > !VLIST3.
>> >> > END MATRIX.
>> >> >
>> >> > DATASET ACTIVATE Sim.
>> >> > ** Check results (Optional) **.
>> >> > CORRELATION !VLIST3 .
>> >> > !ENDDEFINE.
>> >> > SET MPRINT ON PRINTBACK ON.
>> >> >
>> >> > * Self explanatory!*.
>> >> > CorrSim CorrMat 1.00, -0.08, 0.08;-0.08, 1.00,  0.55;0.08, 0.55,
>> 1.00
>> >> >             / N=200  P=3.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----
>> >> --
>> >> Bruce Weaver
>> >>
>>
>> > bweaver@
>>
>> >> http://sites.google.com/a/lakeheadu.ca/bweaver/
>> >>
>> >> "When all else fails, RTFM."
>> >>
>> >> NOTE: My Hotmail account is not monitored regularly.
>> >> To send me an e-mail, please use the address shown above.
>> >>
>> >> --
>> >> View this message in context: http://spssx-discussion.
>> >> 1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734403.html
>> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>> >>
>> >> =====================
>> >> To manage your subscription to SPSSX-L, send a message to
>> >>
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> >> command. To leave the list, send the command
>> >> SIGNOFF SPSSX-L
>> >> For a list of commands to manage subscriptions, send the command
>> >> INFO REFCARD
>> >>
>> >
>> >
>> >
>> > --
>> > Jon K Peck
>>
>> > jkpeck@
>>
>> >
>> > =====================
>> > To manage your subscription to SPSSX-L, send a message to
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> > command. To leave the list, send the command
>> > SIGNOFF SPSSX-L
>> > For a list of commands to manage subscriptions, send the command
>> > INFO REFCARD
>>
>>
>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>>

> bweaver@

>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> View this message in context: http://spssx-discussion.
>> 1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734405.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734408.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Manufactured dataset?

Bruce Weaver
Administrator
In reply to this post by Bruce Weaver
In case anyone is still interested, I had a go at using the Analyze > Simulation approach that Jon has been advocating.  It is shown at the end of the syntax that is pasted below.  I leave it each individual to judge which approach is easier to use, more educational, etc.  

David, given that your macro is the key bit of code, I listed you as first author.  ;-)  

* ===============================================================
*  File:    Marso_CorrSim_macro.SPS
*  Date:    14-Jun-2017
*  Authors: David Marso, Bruce Weaver
*  Notes:   David Marso's macro to generate a data set consistent
*           with a given correlation matrix.
* =============================================================== .

* The original version of the macro in this file was written by David Marso.
* I tweaked it by adding MeanVect and SDvect macro arguments, which allow
* the user to specify the desired means and SDs for the simulated variables.
* I also expanded on some of the comments.

* References:
* http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734387.html
* http://spssx-discussion.1045642.n5.nabble.com/Manufactured-dataset-tp5734381p5734392.html
* http://spssx-discussion.1045642.n5.nabble.com/SERIOUS-problem-with-Multiple-datasets-in-MATRIX-procedure-tp5722765p5725723.html
.
* As noted in the 3rd reference above, the MATRIX SAVE command
* can only save to a declared dataset if you are running SPSS
* as an Administrator.  Not all users are able to do that.  
* Therefore, this syntax includes a work-around, which is to
* temporarily change the working directory to the user's desktop.
* This is a folder for which they have read-write privileges,
* and that allows MATRIX SAVE to work properly.  The use of
* PRESERVE and RESTORE returns everything to the state it was
* in before the syntax file was executed.

PRESERVE.
* The following 2 commands are a work around for the
  Administrative user issue with MATRIX saving to datasets.
FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' .
CD fpFolder_With_ReadWriteRights .

NEW FILE.
DATASET CLOSE all.

* START OF MACRO DEFINITION.
*****************************************************************.
**Uses Cholesky decomposition (essentially a square root of a matrix) and
**orthogonal vectors from FACTOR to generate desired simulated data.
**Updates Howell (removes need for fiddling around with the internal organs).
**<RANT> People should NOT publish code which requires major surgery!! </RANT>
**I wrote code like this code about 20 years ago and the theory is fairly trivial.
*****************************************************************.
**Note: Does not require python or python extensions.

DEFINE CorrSim
 (CorrMat !CHAREND ("/") /
  MeanVect !CHAREND ("/") /
  SDvect !CHAREND ("/") /
  N !TOKENS(1) / P !TOKENS(1)).

!LET !VLIST1 = !CONCAT('X1 TO X',!P ).
!LET !VLIST2 = !CONCAT('FS1 TO FS',!P ).
!LET !VLIST3 = !CONCAT('Sim1 TO Sim',!P ).

* Simulate raw data of desired dimensionality *.
INPUT PROGRAM.
+  LOOP #=1 TO !N.
+    DO REPEAT V=!VLIST1.
+      COMPUTE V=RV.NORMAL(0,1).
+    END REPEAT.
+    END CASE.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

* Orthogonalize Vectors * .
**OOPS (Criteria  MINEIGEN(0) not respected on SAVE) **.
**OLD CODE ** FACTOR VAR !VLIST1 / CRITERIA MINEIGEN(0)/ SAVE REG (!P,FS).
FACTOR VAR !VLIST1 / CRITERIA FACTORS(!P)/ SAVE REG (!P,FS).

DATASET DECLARE Sim .

* Generate Vectors with desired dimensionality *.
MATRIX.
GET RAW   / VAR !VLIST1.
GET FS    / VAR !VLIST2.
COMPUTE CORR={!CorrMat}.
SAVE {RAW,FS,FS*CHOL(CORR)} / OUTFILE Sim /VARIABLES !VLIST1 !VLIST2 !VLIST3.
END MATRIX.

* The variables X1 to XP are no longer needed, so replace them
* with the re-scaled simulated variables.
DATASET ACTIVATE Sim.
DO REPEAT x = !VLIST1 / S = !VLIST3 / M = !MeanVect / StDev = !SDvect.
- COMPUTE x = S*StDev+M.
END REPEAT.
* Check that it worked as intended (OPTIONAL) .
DESCRIPTIVES !VLIST3 !VLIST1.
* Correlations for variables with Mean=0, SD=1.
CORRELATIONS !VLIST3 /MISSING=LISTWISE.
* Correlations for re-scaled variables.
CORRELATIONS !VLIST1 /MISSING=LISTWISE.
!ENDDEFINE.
* END OF MACRO DEFINITION.

* -------------------------------------.
* Now demonstrate how to use the macro.
* -------------------------------------.

* Comment out the SET command below if you like
* when it is no longer needed for debugging.
SET MPRINT ON PRINTBACK ON.

* When calling the macro,
*  CorrMat = the desired correlation matrix, with commas separating
*            elements and semicolons separating rows.
*  MeanVect = vector of desired means for simulated variables
*  SDvect = vector of desired SDs for simulated variables.
*  N = the number of cases in the simulated dataset;
*  P = the number of variables in the simulated dataset.
* MeanVect and SDvect will be used in a DO-REPEAT structure,
* so just list the values with no commas.

CorrSim
 CorrMat 1.00, -0.08, 0.08; -0.08, 1.00, 0.55; 0.08, 0.55,  1.00 /
 MeanVect 45 50 55 /
 SDvect 11 10 9 /
 N=200  P=3.

SET MPRINT OFF.
RESTORE.  /* Restore the current working directory.

* Meanwhile, Jon Peck is advocating for Analyze > Simulation.
* The GUI for that does not look straightforward.  But I got
* far enough to figure out that the relevant commands are
* SIMPLAN and SIMRUN.
* https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simplan.htm
* https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simrun.htm
* https://www.ibm.com/support/knowledgecenter/en/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_simprep.htm
.

*Create simulation plan.
* Change file locations below as needed.
FILE HANDLE simplan_295639 /NAME='C:\Temp\SimulationPlan_1.splan'.
SIMPLAN CREATE
 /CONTINGENCY MULTIWAY=NO
 /SIMINPUT INPUT=Y1(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=NORMAL(MEAN=45
    STDDEV=11 )
 /SIMINPUT INPUT=Y2(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=NORMAL(MEAN=50
    STDDEV=10 )
 /SIMINPUT INPUT=Y3(FORMAT=F,2) OUTPUT=YES TYPE=MANUAL(LOCK=NO) DISTRIBUTION=NORMAL(MEAN=55
    STDDEV=9 )
 /CORRELATIONS VARORDER=Y1 Y2 Y3  CORRMATRIX=1.0; -0.08, 1.0; 0.08, 0.55, 1.0 LOCK=NO
 /AUTOFIT NCASES=ALL FIT=AD BINS=100
 /STOPCRITERIA MAXCASES= 1000 /* Default = 100000, min allowed = 1000 */
 /MISSING CLASSMISSING=EXCLUDE
 /PLAN FILE=simplan_295639 DISPLAY=YES.
*Run simulation plan.
DATASET DECLARE SimData.
SIMRUN
 /PLAN FILE=simplan_295639
 /CRITERIA  REPRESULTS=TRUE  SEED=629111597
 /PRINT ASSOCIATIONS=YES DESCRIPTIVES=YES PERCENTILES=NO
 /OUTFILE FILE=SimData.

DATASET ACTIVATE SimData.
DESCRIPTIVES Y1 to Y3.
CORRELATIONS Y1 to Y3 /MISSING=LISTWISE.

* =============================================================== .


--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).