I've been playing with the SET SEED command and like to share some thoughts on it. If you want to be able to reproduce random numbers you will need the SET SEED command. But there are pitfalls as can be seen from these syntax examples: * Method 1: Wrong. SET RNG=MT MTINDEX=2019. COMPUTE TestVar1 = RV.NORMAL(100,10). SET RNG=MT MTINDEX=2019. COMPUTE TestVar2 = RV.NORMAL(100,10). SET RNG=MT MTINDEX=2019. COMPUTE TestVar3 = RV.NORMAL(100,10). EXE. * Method 2: Correct. SET RNG=MT MTINDEX=2019. COMPUTE TestVar4 = RV.NORMAL(100,10). EXE. SET RNG=MT MTINDEX=2019. COMPUTE TestVar5 = RV.NORMAL(100,10). EXE. SET RNG=MT MTINDEX=2019. COMPUTE TestVar6 = RV.NORMAL(100,10). EXE. * Method 3: Wrong. SET RNG=MT MTINDEX=2019. COMPUTE TestVar7 = RV.NORMAL(100,10). EXE. COMPUTE TestVar8 = RV.NORMAL(100,10). EXE. COMPUTE TestVar9 = RV.NORMAL(100,10). EXE. This is one of the rare examples where the EXECUTE command should be used immediately. Unfortunatley the syntax reference guide (SET MTINDEX) does not mention this at all. I'd propose to insert a hint in upcoming versions. Mario Giesel Munich, Germany |
What bothers you? What way "wrong"?
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Getting the same numbers goes wrong. Mario Giesel Munich, Germany
Am Mittwoch, 18. Dezember 2019, 12:25:32 MEZ hat Kirill Orlov <[hidden email]> Folgendes geschrieben:
What bothers you? What way "wrong"? |
In reply to this post by Kirill Orlov
Okay, see here: TestVar1 TestVar2 TestVar3 TestVar4 TestVar5 TestVar6 TestVar7 TestVar8 TestVar9 102,46 113,34 100,90 102,46 102,46 102,46 102,46 118,33 94,82 97,62 95,10 101,63 113,34 113,34 113,34 113,34 99,16 87,27 103,38 90,07 102,52 100,90 100,90 100,90 100,90 97,41 101,09 103,53 103,89 93,36 97,62 97,62 97,62 97,62 91,97 89,67 85,84 101,07 96,22 95,10 95,10 95,10 95,10 74,09 107,57 Testvar4, 5 and 6 produce the same random numbers, as intended. Testvar1, 2 and 3 produce different numbers. Testvar7, 8 and 9 produce different numbers. I'm not sure if this makes it clear?! Mario Giesel Munich, Germany
Am Mittwoch, 18. Dezember 2019, 14:50:39 MEZ hat Kirill Orlov <[hidden email]> Folgendes geschrieben:
What bothers you? What way "wrong"? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by spss.giesel@yahoo.de
There is no error, the behaviour of seeding is correct in all your examples.
When you set a specific seed (MTINDEX= or SEED=) it is the starting value of a series of subsequent seeds. The series is completely determined by the sterting value. For example, SET MTINDEX=2019 means seed1=2019, and it determines all other members of the infinite series: seed2 seed3 seed4.... You know seed1 (because you specified it) but YOU don't know seed2 seed3 seed4....; however seed1 "knows" seed2 seed3 seed4.... via the random numbers algorithm used. If you specify SET MTINDEX=RANDOM, nothing changes in principle, except that now you don't know even seed1: it itself will be selected, randomly, by the algorithm; yet it will determine precisely all the series, as before. Now, LET us have THREE cases in our dataset. Run your pieces of code and see what's happening. Every time a single computation (using random numbers) is done it uses the next seed in the series as its internal argument for the random number function used. * Method 2. SET RNG=MT MTINDEX=2019. COMPUTE TestVar4 = RV.NORMAL(100,10). /*seed1 (case1), seed2 (case2), seed3 (case3) EXE. /*This executes three times, because three cases SET RNG=MT MTINDEX=2019. /*You again set seed1 to 2019 COMPUTE TestVar5 = RV.NORMAL(100,10). /*seed1 (case1), seed2 (case2), seed3 (case3) EXE. /*This executes three times, because three cases SET RNG=MT MTINDEX=2019. /*You again set seed1 to 2019 COMPUTE TestVar6 = RV.NORMAL(100,10). /*seed1 (case1), seed2 (case2), seed3 (case3) EXE. /*This executes three times, because three cases * Method 3. SET RNG=MT MTINDEX=2019. COMPUTE TestVar7 = RV.NORMAL(100,10). /*seed1 (case1), seed2 (case2), seed3 (case3) EXE. /*This executes three times, because three cases COMPUTE TestVar8 = RV.NORMAL(100,10). /*seed4 (case1), seed5 (case2), seed6 (case3) EXE. /*This executes three times, because three cases; since you did not reset seed1, series continued: seed4 seed5 seed6 COMPUTE TestVar9 = RV.NORMAL(100,10). /*seed7 (case1), seed8 (case2), seed9 (case3) EXE. /*Likewise, series continued: seed7 seed8 seed9 * Method 1A. Here, generation goes within a case first (all three COMPUTEs serve one case) * then the focus shifts to next case. SET RNG=MT MTINDEX=2019. COMPUTE TestVar1 = RV.NORMAL(100,10). /*seed1 (case1); /*seed4 (case2); /*seed7 (case3) COMPUTE TestVar2 = RV.NORMAL(100,10). /*seed2 (case1); /*seed5 (case2); /*seed8 (case3) COMPUTE TestVar3 = RV.NORMAL(100,10). /*seed3 (case1); /*seed6 (case2); /*seed9 (case3) EXE. /*As said, this executes all three COMPUTEs for each case in a batch * Method 1. This is a bit tricky to understand. This code gives the same result as Method 1A. * Why so? * My interpretation is as follows. A seed (a next seed in the series chain) is actually * utilized, as an argumnet to a function, AT THE EXECUTION time. And you cannot meddle * into the execution flow when it runs. In particular, you cannot change seed (i.e., seed1, * defining subsequent series of seeds) during execution. * You have only one execution command which runs all three COMPUTEs immediately one after another, * and EXECUTE is therefore blind to your inadequate "attempts" ro redefine seed: * it follows set command 1 and has no opportunity to "hear" set command 2, set command 3. * These two last set commands are idle, they are misplaced syntactically; you should switch them off. SET RNG=MT MTINDEX=2019. /*set command 1 COMPUTE TestVar1 = RV.NORMAL(100,10). /*seed1 (case1); /*seed4 (case2); /*seed7 (case3) SET RNG=MT MTINDEX=2019. /*set command 2 (idle) COMPUTE TestVar2 = RV.NORMAL(100,10). /*seed2 (case1); /*seed5 (case2); /*seed8 (case3) SET RNG=MT MTINDEX=2019. /*set command 3 (idle) COMPUTE TestVar3 = RV.NORMAL(100,10). /*seed3 (case1); /*seed6 (case2); /*seed9 (case3) EXE. /*As said, this executes all three COMPUTEs for each case in a batch ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by spss.giesel@yahoo.de
Good observation, Mario, and good suggestion about editing the documentation.
Cheers, Bruce Mario Giesel-2 wrote > I've been playing with the SET SEED command and like to share some > thoughts on it.If you want to be able to reproduce random numbers you will > need the SET SEED command.But there are pitfalls as can be seen from these > syntax examples: > > * Method 1: Wrong.SET RNG=MT MTINDEX=2019.COMPUTE TestVar1 = > RV.NORMAL(100,10).SET RNG=MT MTINDEX=2019.COMPUTE TestVar2 = > RV.NORMAL(100,10).SET RNG=MT MTINDEX=2019.COMPUTE TestVar3 = > RV.NORMAL(100,10).EXE. > * Method 2: Correct.SET RNG=MT MTINDEX=2019.COMPUTE TestVar4 = > RV.NORMAL(100,10).EXE.SET RNG=MT MTINDEX=2019.COMPUTE TestVar5 = > RV.NORMAL(100,10).EXE.SET RNG=MT MTINDEX=2019.COMPUTE TestVar6 = > RV.NORMAL(100,10).EXE. > * Method 3: Wrong.SET RNG=MT MTINDEX=2019.COMPUTE TestVar7 = > RV.NORMAL(100,10).EXE.COMPUTE TestVar8 = RV.NORMAL(100,10).EXE.COMPUTE > TestVar9 = RV.NORMAL(100,10).EXE. > > This is one of the rare examples where the EXECUTE command should be used > immediately.Unfortunatley the syntax reference guide (SET MTINDEX) does > not mention this at all.I'd propose to insert a hint in upcoming versions. > > Mario GieselMunich, Germany > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Mario,
Thanks for this. Jon sent out a link last week to the IBM site for making suggestions for changes. It's https://ibm-data-and-ai.ideas.aha.io/?project=STATS .
You can add your suggestion in or, if it's already been noted, add your vote. As Jon mentioned, it's more effective than leaving it on a stats user group site.
Brian
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, December 18, 2019 9:41 AM To: [hidden email] <[hidden email]> Subject: Re: SET SEED pitfalls Good observation, Mario, and good suggestion about editing the documentation.
Cheers, Bruce Mario Giesel-2 wrote > I've been playing with the SET SEED command and like to share some > thoughts on it.If you want to be able to reproduce random numbers you will > need the SET SEED command.But there are pitfalls as can be seen from these > syntax examples: > > * Method 1: Wrong.SET RNG=MT MTINDEX=2019.COMPUTE TestVar1 = > RV.NORMAL(100,10).SET RNG=MT MTINDEX=2019.COMPUTE TestVar2 = > RV.NORMAL(100,10).SET RNG=MT MTINDEX=2019.COMPUTE TestVar3 = > RV.NORMAL(100,10).EXE. > * Method 2: Correct.SET RNG=MT MTINDEX=2019.COMPUTE TestVar4 = > RV.NORMAL(100,10).EXE.SET RNG=MT MTINDEX=2019.COMPUTE TestVar5 = > RV.NORMAL(100,10).EXE.SET RNG=MT MTINDEX=2019.COMPUTE TestVar6 = > RV.NORMAL(100,10).EXE. > * Method 3: Wrong.SET RNG=MT MTINDEX=2019.COMPUTE TestVar7 = > RV.NORMAL(100,10).EXE.COMPUTE TestVar8 = RV.NORMAL(100,10).EXE.COMPUTE > TestVar9 = RV.NORMAL(100,10).EXE. > > This is one of the rare examples where the EXECUTE command should be used > immediately.Unfortunatley the syntax reference guide (SET MTINDEX) does > not mention this at all.I'd propose to insert a hint in upcoming versions. > > Mario GieselMunich, Germany > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsites.google.com%2Fa%2Flakeheadu.ca%2Fbweaver%2F&data=02%7C01%7Cbdates%40SWSOL.ORG%7C924d542646f3440e7a0a08d783c5f48e%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637122758552229339&sdata=XacFfBUdiYFkhLSM6F874j0Sq9sYOlPEGhxw0pHJKEQ%3D&reserved=0 "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspssx-discussion.1045642.n5.nabble.com%2F&data=02%7C01%7Cbdates%40SWSOL.ORG%7C924d542646f3440e7a0a08d783c5f48e%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637122758552229339&sdata=n4NCBs2HoIPu7buiNLb8tlHu91Yum2vQuHmnk3J4BTA%3D&reserved=0 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks, Kirill, I think your comments make it much more clear how the randomization process is being controlled. Only now I realized that the first row of TestVar1 to TestVar3 is a transpose of TestVar4. I really think that such information would be worthwhile to be mentioned in the syntax reference. Thanks, Brian, for the link to the change suggestions page. I'll have a look. Thanks, Bruce. Mario Giesel Munich, Germany
Am Mittwoch, 18. Dezember 2019, 15:51:39 MEZ hat Dates, Brian <[hidden email]> Folgendes geschrieben:
Mario,
Thanks for this. Jon sent out a link last week to the IBM site for making suggestions for changes. It's https://ibm-data-and-ai.ideas.aha.io/?project=STATS .
You can add your suggestion in or, if it's already been noted, add your vote. As Jon mentioned, it's more effective than leaving it on a stats user group site.
Brian
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, December 18, 2019 9:41 AM To: [hidden email] <[hidden email]> Subject: Re: SET SEED pitfalls Good observation, Mario, and good suggestion about editing the documentation.
Cheers, Bruce Mario Giesel-2 wrote > I've been playing with the SET SEED command and like to share some > thoughts on it.If you want to be able to reproduce random numbers you will > need the SET SEED command.But there are pitfalls as can be seen from these > syntax examples: > > * Method 1: Wrong.SET RNG=MT MTINDEX=2019.COMPUTE TestVar1 = > RV.NORMAL(100,10).SET RNG=MT MTINDEX=2019.COMPUTE TestVar2 = > RV.NORMAL(100,10).SET RNG=MT MTINDEX=2019.COMPUTE TestVar3 = > RV.NORMAL(100,10).EXE. > * Method 2: Correct.SET RNG=MT MTINDEX=2019.COMPUTE TestVar4 = > RV.NORMAL(100,10).EXE.SET RNG=MT MTINDEX=2019.COMPUTE TestVar5 = > RV.NORMAL(100,10).EXE.SET RNG=MT MTINDEX=2019.COMPUTE TestVar6 = > RV.NORMAL(100,10).EXE. > * Method 3: Wrong.SET RNG=MT MTINDEX=2019.COMPUTE TestVar7 = > RV.NORMAL(100,10).EXE.COMPUTE TestVar8 = RV.NORMAL(100,10).EXE.COMPUTE > TestVar9 = RV.NORMAL(100,10).EXE. > > This is one of the rare examples where the EXECUTE command should be used > immediately.Unfortunatley the syntax reference guide (SET MTINDEX) does > not mention this at all.I'd propose to insert a hint in upcoming versions. > > Mario GieselMunich, Germany > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsites.google.com%2Fa%2Flakeheadu.ca%2Fbweaver%2F&data=02%7C01%7Cbdates%40SWSOL.ORG%7C924d542646f3440e7a0a08d783c5f48e%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637122758552229339&sdata=XacFfBUdiYFkhLSM6F874j0Sq9sYOlPEGhxw0pHJKEQ%3D&reserved=0 "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspssx-discussion.1045642.n5.nabble.com%2F&data=02%7C01%7Cbdates%40SWSOL.ORG%7C924d542646f3440e7a0a08d783c5f48e%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637122758552229339&sdata=n4NCBs2HoIPu7buiNLb8tlHu91Yum2vQuHmnk3J4BTA%3D&reserved=0 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This issue is covered in the Universals section of the Command Syntax Reference. It is not specific to SET, but the SET doc does say it takes effect immediately. And the COMPUTE doc does explain its deferred execution. This is an efficiency win, but this out-of-order execution does sometimes cause confusion such as here. The usual error, though, is the use of an excess of EXECUTE commands. From the CSR... In addition to observing the rules above, it is often important to distinguish between commands that cause the data to be read and those that do not, and between those that are stored pending execution with the next command that reads the data and those that take effect immediately without requiring that the data be read. • Commands that cause the data to be read, as well as execute pending transformations, include all statistical procedures (e.g., CROSSTABS, FREQUENCIES, REGRESSION); some commands that save/ write the contents of the active dataset (e.g., DATASET COPY, SAVE TRANSLATE, SAVE); AGGREGATE; AUTORECODE; EXECUTE; RANK; and SORT CASES. • Commands that are stored, pending execution with the next command that reads the data, include transformation commands that modify or create new data values (e.g., COMPUTE, RECODE), commands that define conditional actions (e.g., DO IF, IF, SELECT IF), PRINT, WRITE, and XSAVE. For a comprehensive list of these commands, see “Commands That Are Stored, Pending Execution” on page 39 . • Commands that take effect immediately without reading the data or executing pending commands include transformations that alter dictionary information without affecting the data values (e.g., MISSING VALUES, VALUE LABELS) and commands that don't require an active dataset (e.g., DISPLAY, HOST, INSERT, OMS, SET). In addition to taking effect immediately, these commands are also processed unconditionally. For example, when included within a DO IF structure, these commands run regardless of whether or not the condition is ever met. For a comprehensive list of these commands, see “Commands That Take Effect Immediately” on page 37 . On Wed, Dec 18, 2019 at 8:05 AM Mario Giesel <[hidden email]> wrote:
|
The documentation for the SET command also notes that it does not read the data and does not execute pending transformations -- but you might not notice that if you were just looking for information on the behavior of SET SEED. And the documentation for SET SEED is a little...sparse. Method 1 and method 2 give the results I would expect. Method 3 might seem a little less obvious. The starting seed is changed after each EXECUTE. *method 1 (without redundant SET commands). set rng mc. set seed 1234. compute v1=rv.normal(100,10). show seed. compute v2=rv.normal(100,10). show seed. compute v3=rv.normal(100,10). show seed. execute. *method 3 -- seed re-initialized after each Execute. set seed 1234. compute v4=rv.normal(100,10). execute. show seed. compute v5=rv.normal(100,10). execute. show seed. compute v6=rv.normal(100,10). execute. show seed. Note that across sessions or runs within a session, results should remain the same as long as you don't change the syntax you started with, regardless of where the various commands are placed. On Wed, Dec 18, 2019 at 11:47 AM Jon Peck <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |