I must draw fixed count random samples from different groups and I am looking for more efficient code. For example: DO IF GROUPCODE=7. SAMPLE 5 FROM 32. (write to outfile) END IF. DO IF GROUPCODE=9. SAMPLE 10 from 93. (write to outfile) END IF. I have tried using DO REPEAT, but the SAMPLE command doesn’t let me use a function or macro variable in place of the numbers. I look to your wisdom and tutelage – any help appreciated. |
Administrator
|
I would go at it quite differently.
I am assuming you know the counts for each group? No need in this case to know N per group. say GROUPCODE DesiredCount 1 10 2 5 3 .. ... 7 5 .. 9 10 Create a small SPSS data file containing these 2 variables and values save it as "<somevaliddirpath>COUNTS.SAV". <somevaliddirpath> being something like "C:\Documents\SPSSData\SomeProject\" . Grab your data file. GET FILE'masterfileblahblah.sav". COMPUTE @SCRAMBLER=UNIFORM(1). SORT CASES BY GROUPCODE @SCRAMBLER. MATCH FILES / FILE * / TABLE "<somevaliddirpath>COUNTS.SAV" / BY GROUPCODE. IF $CASENUM=1 OR LAG(GROUPCODE) NE GROUPCODE GPCounter=1. IF MISSING( GPCounter) GPCounter=LAG(GPCounter+1). COMPUTE Use_This_Case=GPCounter LE DesiredCount . FILTER BY Use_This_Case.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by CG
Totally untested, but what happens if you try something like: Do repeat x = 7,9 /y = 5, 10 /z = 32, 93 /a = 1,2 . /b = b1, b2 . Do if groupcode = x . Sample y from z . End if . Compute b = a. End repeat . This gives variables b1 and b2 which can be used as filters for writing. I’m sure regulars such as Bruce, David, ViAnn or Albert-Jan will come up with something. John F Hall From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gregory, Cindy, PED I must draw fixed count random samples from different groups and I am looking for more efficient code. For example: DO IF GROUPCODE=7. SAMPLE 5 FROM 32. (write to outfile) END IF. DO IF GROUPCODE=9. SAMPLE 10 from 93. (write to outfile) END IF. I have tried using DO REPEAT, but the SAMPLE command doesn’t let me use a function or macro variable in place of the numbers. I look to your wisdom and tutelage – any help appreciated. |
Administrator
|
It makes *NO* sense at all to use SAMPLE within a DO REPEAT.
Think about that for a moment ;-) When you do something like DO REPEAT X= a b c d / Y = e f g h / Z= ae be cg dh. COMPUTE Z=X/Y. END REPEAT. what do you end up with? 4 new variables on *EACH* case... i.e. DO REPEAT handles TRANSFORMATIONS and applies to each case. How would SAMPLE fit into a DO REPEAT?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
David
It was just a thought, and I was only trying to help! I once had a similar problem when trying to introduce students to inferential statistics. I'd have say 24 students and a survey with 1800 cases for a lab session. Each student was asked to sample n from N with a different SET SEED starting point. This was in the days of 16 VDU's connected to a remote Vax mainframe (12 working if you were lucky and severe time constraints if we didn't want to get locked in the building, or get away before the Arsenal match finished up the road) not modern PCs and distance learning. The idea was to get 3 samples each to yield 72 statistics (mean lifesat, % happy or whatever) which we then plotted (on the chalk-board) hopefully to demonstrate a distribution with an approximately normal distribution. Sometimes it worked, sometimes it didn't (usually because sampling 30 from 3000 isn't as stable as 300 from 3000) but the students learned a lot from the attempt and understood what we were trying to do, especially that the sampling distribution of the mean was approximately normal even if the variable itself was not (eg age) Now, how about some syntax to do what I need for a new tutorial for the website? Assume data set is BSA89.sav, N is 3000 and I want 100 samples of size 300: then to save mean lifesat, mean age and % very happy (code 3 on happy) as variables in a separate file with 100 cases (one for each sample). Have a nice weekend. John [hidden email] www.surveyresearch.weebly.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: 15 July 2011 19:33 To: [hidden email] Subject: Re: Syntax for SAMPLE It makes *NO* sense at all to use SAMPLE within a DO REPEAT. Think about that for a moment ;-) When you do something like DO REPEAT X= a b c d / Y = e f g h / Z= ae be cg dh. COMPUTE Z=X/Y. END REPEAT. what do you end up with? 4 new variables on *EACH* case... i.e. DO REPEAT handles TRANSFORMATIONS and applies to each case. How would SAMPLE fit into a DO REPEAT? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Syntax-for-SAMPLE-tp4591268p45 91568.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
"Assume data set is BSA89.sav, N is 3000 and I want 100 samples of size 300:
then to save mean lifesat, mean age and % very happy (code 3 on happy) as variables in a separate file with 100 cases (one for each sample)." Here you go John, enjoy ;-) --- *** OK here we go **. **First of all oversample **. LOOP SAMPLE=1 TO 100. + DO IF UNIFORM(1) < .12 . + XSAVE OUTFILE "Samples.sav" / KEEP caseid lifesat age happy sample. + END IF. END LOOP. *ONE OF THE ONLY Places you NEED an EXECUTE *. EXECUTE. GET FILE "samples.sav". FREQ SAMPLE. * Using same ideas as what I posted to Cindy Gregory * COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY SAMPLE SCRAMBLE. IF $CASENUM=1 OR LAG(SAMPLE) NE SAMPLE GPCount=1. IF MISSING( GPCount) GPCount=LAG(GPCount)+1. *ONE OF THE ONLY Places you NEED an EXECUTE *. EXECUTE. SELECT IF GPCount LE 300. FREQ SAMPLE. AGGREGATE OUTFILE * / BREAK SAMPLE / MLIFESAT MEANAGE = MEAN(lifesat age) / PctVHapp=PIN(happy,3,3).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by John F Hall
Hi John. I've sent you something off-list that you may be able to cobble into the kind of demo/tutorial you want. It uses some code written by David Marso, but posted by his then colleague David Nichols. Here's the link:
http://groups.google.com/group/sci.stat.consult/msg/710ea4ab83ddf24a?dmode=source HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
GACK Bruce! That old thing??? It is from 1996 right about the time I bailed from SPSS TekSport and transferred to the consulting group. Almost 15 years to the day... How time flies!!!!!!!!!!!!!!!! The SPSS-X Archives are truncated at 1996 and everything before that has been bit-bucketted. Hey, ever wonder what ever happened to all the gems which I posted between 1992 and 1996? There were some pretty twisted Rube-Goldbergesque monstrosities I inflicted upon the world in those days. Self modifying SPSS code, unintelligible one liners, programs to read data configured like they came from the brain of H.P. Lovecraft or some other dark place (almost lost my mind doing that job). I used to have those all on my old dead PC (HD is fine but the box is dead, need to pull data some day). *BUT* I think I like what I posted earlier today much more than that old one. Note also that what you reference does BOOTSTRAP sampling (with replacement). The current does sampling without replacement (slightly oversample beyond the desired ratio, XSAVE cases which are 'sampled', then do the random scramble and nuke the extra cases at the end). Thanks for the blast from the past (I think-- cringe--)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
When I found that old post, I was looking for a quick & dirty way of generating bootstrap samples (with replacement), and as I recall, there was not much else out there. Anyway, it worked out quite well for what I was doing at the time. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by CG
At 12:03 PM 7/15/2011, Gregory, Cindy, PED wrote:
>I must draw fixed count random samples from different groups and I >am looking for more efficient code. For example: > >DO IF GROUPCODE=7. >SAMPLE 5 FROM 32. >(write to outfile) >END IF. > >DO IF GROUPCODE=9. >SAMPLE 10 from 93. >(write to outfile) >END IF. I'd skip SAMPLE, and write straight 'k/n' logic. First, sort the data by GROUPCODE. Then (untested) AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=GROUPCODE /GroupSize 'Number of members in group' = NU. Then, if you don't mind writing the desired sample sizes into your code, something like this (still untested): DO IF $CASENUM EQ 1 OR GROUPCODE NE LAG(GROUPCODE) . * #n is initially the group size, and then the number . * of group members not yet tested. . * #k is initially the desired sample size from the group,. * and then the number of members of that sample still . * to be selected. . . COMPUTE #n = GroupSize. * The RECODE statement gives the group sample sizes; . * this is an example of "data in code". . . RECODE GROUPCODE (1 = ???) (2 = ???) (7 = 5) (9 = 10) INTO #k /* desired sample from group */. END IF. * If desired, specify which random-number generator to use, . * and give a starting seed. . NUMERIC Take_It (F2). VAR LABEL Take_It 'Indicator, for case being in the sample'. COMPUTE Take_It = RV.BERNOULLI(#k/#n). COMPUTE #k = #k - Take_It. COMPUTE #n = #n - 1. EXECUTE /* (not sure whether this is needed) */. SELECT IF Take_It. If you don't like specifying sample sizes in the RECODE statement, you can set up a file named SAMPLES, like this: GROUPCODE SampleSize. 1 ??? 2 ??? 7 5 9 10 Then, after the AGGREGATE, MATCH FILES /FILE =* /TABLE=SAMPLES /BY GROUPCODE. DO IF $CASENUM EQ 1 OR GROUPCODE NE LAG(GROUPCODE) . * #n is initially the group size, and then the number . * of group members not yet tested. . * #k is initially the desired sample size from the group,. * and then the number of members of that sample still . * to be selected. . . COMPUTE #n = GroupSize. . COMPUTE #k = SampleSize. END IF. and continue as before. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
If one just wants say a mean then something like this works without creating a large case level file.
*SIMULATE SOME DATA. NEW FILE. INPUT PROGRAM. LOOP #ID=1 TO 1000. + COMPUTE VAR= RV.Normal(5,10). + END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXE. DESC VAR. FLIP. VECTOR V=VAR001 TO VAR1000. LOOP SAMPLE=1 TO 500. + COMPUTE MEAN=0. + LOOP #PULL=1 TO 100. + COMPUTE #=TRUNC(UNIFORM(1000)+1). + COMPUTE MEAN=MEAN+V(#). + END LOOP. + COMPUTE MEAN=MEAN/100. + XSAVE OUTFILE "BOOTTEMP.SAV" / KEEP SAMPLE MEAN. END LOOP. EXECUTE. GET FILE "BOOTTEMP.SAV" . DESC MEAN.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Bruce Weaver
David, Bruce
What a great exchange Cindy started. Thanks for all these suggestions. Luckily I got 4 hours veggie patch weeding done yesterday and it's raining all day today and tomorrow, so I'll have a shot at producing a draft demonstration. Technically I suppose the method I used in class was sampling with replacement as students were asked to set the seed to a very high number (their date of birth in yymmdd form): on one occasion three students got the same samples because they all had the same date of birth! In the 1970s and even the 1990s, these were mostly graduates in sociology and related subjects with little or no training in statistics or experience of computers, some barely numerate and some who could not type. They would run a mile from anything with an equation in it. However, by using painstaking (and time-consuming) step-by-step plotting of these and similar results we could build up an equation in a way students (and I) understood, not just as a mathematical expression, but also as an incredibly powerful tool. I used to have another example for regression and correlation using imagery of a rigid pole and elastic bands, but that's another story. John [hidden email] www.surveyresearch.weebly.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: 15 July 2011 23:06 To: [hidden email] Subject: Re: Syntax for SAMPLE When I found that old post, I was looking for a quick & dirty way of generating bootstrap samples (with replacement), and as I recall, there was not much else out there. Anyway, it worked out quite well for what I was doing at the time. ;-) David Marso wrote: > > GACK Bruce! > That old thing??? > It is from 1996 right about the time I bailed from SPSS TekSport and > transferred to the consulting group. > Almost 15 years to the day... How time flies!!!!!!!!!!!!!!!! > The SPSS-X Archives are truncated at 1996 and everything before that has > been bit-bucketted. > Hey, ever wonder what ever happened to all the gems which I posted between > 1992 and 1996? > There were some pretty twisted Rube-Goldbergesque monstrosities I > inflicted upon the world in those days. Self modifying SPSS code, > unintelligible one liners, programs to read data configured like they came > from the brain of H.P. Lovecraft or some other dark place (almost lost my > mind doing that job). > > I used to have those all on my old dead PC (HD is fine but the box is > dead, need to pull data some day). > *BUT* I think I like what I posted earlier today much more than that old > one. > Note also that what you reference does BOOTSTRAP sampling (with > replacement). > The current does sampling without replacement (slightly oversample beyond > the desired ratio, XSAVE cases which are 'sampled', then do the random > scramble and nuke the extra cases at the end). > Thanks for the blast from the past (I think-- cringe--) > > > > Bruce Weaver wrote: >> >> Hi John. I've sent you something off-list that you may be able to cobble >> into the kind of demo/tutorial you want. It uses some code written by >> David Marso, but posted by his then colleague David Nichols. Here's the >> link: >> >> >> ource >> >> HTH. >> >> >> >> John F Hall wrote: >>> >>> David >>> >>> It was just a thought, and I was only trying to help! >>> >>> I once had a similar problem when trying to introduce students to >>> inferential statistics. I'd have say 24 students and a survey with 1800 >>> cases for a lab session. Each student was asked to sample n from N with >>> a >>> different SET SEED starting point. This was in the days of 16 VDU's >>> connected to a remote Vax mainframe (12 working if you were lucky and >>> severe >>> time constraints if we didn't want to get locked in the building, or get >>> away before the Arsenal match finished up the road) not modern PCs and >>> distance learning. The idea was to get 3 samples each to yield 72 >>> statistics (mean lifesat, % happy or whatever) which we then plotted (on >>> the >>> chalk-board) hopefully to demonstrate a distribution with an >>> approximately >>> normal distribution. Sometimes it worked, sometimes it didn't (usually >>> because sampling 30 from 3000 isn't as stable as 300 from 3000) but the >>> students learned a lot from the attempt and understood what we were >>> trying >>> to do, especially that the sampling distribution of the mean was >>> approximately normal even if the variable itself was not (eg age) >>> >>> Now, how about some syntax to do what I need for a new tutorial for the >>> website? >>> >>> Assume data set is BSA89.sav, N is 3000 and I want 100 samples of size >>> 300: >>> then to save mean lifesat, mean age and % very happy (code 3 on happy) >>> as >>> variables in a separate file with 100 cases (one for each sample). >>> >>> Have a nice weekend. >>> >>> John >>> >>> [hidden email] >>> www.surveyresearch.weebly.com >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >>> David Marso >>> Sent: 15 July 2011 19:33 >>> To: [hidden email] >>> Subject: Re: Syntax for SAMPLE >>> >>> It makes *NO* sense at all to use SAMPLE within a DO REPEAT. >>> Think about that for a moment ;-) >>> >>> When you do something like >>> DO REPEAT X= a b c d / Y = e f g h / Z= ae be cg dh. >>> COMPUTE Z=X/Y. >>> END REPEAT. >>> >>> what do you end up with? >>> 4 new variables on *EACH* case... >>> i.e. DO REPEAT handles TRANSFORMATIONS and applies to each case. >>> How would SAMPLE fit into a DO REPEAT? >>> >>> -- >>> View this message in context: >>> >>> 91568.html >>> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Syntax-for-SAMPLE-tp4591268p45 92216.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
David
Tried your syntax, but had to modify it a bit. *Marso sample . * data set 'ql4gb1975.sav' . * variables: var544 (happy), var545 (lifesat), age . compute happy = var544. compute lifesat = var545 . compute caseid = serial . *** OK here we go **. SPSS doesn't like SCRAMBLE and I can't find it anywhere in the Syntax Reference Guide, so I dumped it. The new file has four variables, but only one case. I was looking to create a data file with up to n cases corresponding to the number of samples drawn. I can then run SPSS to demonstrate the distributions of sample statistics with: Freq MLIFESAT MEANAGE PctVHapp /for not /his . Also the variables SAMPLE (all with value 101) CASEID ( 1 to 300) and GPCOUNT (1 to 300) are appended to the existing data set, which I don't particularly want, and the number of cases has dropped from 932 to 300, which I definitely don't want! Some of this complex syntax is new to me, but I did use Algol intensively in the 1960s (to manage and analyse survey data) so I can follow the logic. I haven't tried Bruce's syntax yet, or Dave Nicholls' earlier version, but if I can get something that works, it will be a valuable learning aid. Gives me something to do instead of gardening in the rain or catching up on 300 or so films, dramas and documentaries recorded from TV! John [hidden email] www.surveyresearch.weebly.com PS If you haven't already seen it, get Bruce to send you the slideshow. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: 15 July 2011 21:45 To: [hidden email] Subject: Re: Syntax for SAMPLE "Assume data set is BSA89.sav, N is 3000 and I want 100 samples of size 300: then to save mean lifesat, mean age and % very happy (code 3 on happy) as variables in a separate file with 100 cases (one for each sample)." Here you go John, enjoy ;-) --- *** OK here we go **. **First of all oversample **. LOOP SAMPLE=1 TO 100. + DO IF UNIFORM(1) < .12 . + XSAVE OUTFILE "Samples.sav" / KEEP caseid lifesat age happy sample. + END IF. END LOOP. *ONE OF THE ONLY Places you NEED an EXECUTE *. EXECUTE. GET FILE "samples.sav". FREQ SAMPLE. * Using same ideas as what I posted to Cindy Gregory * COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY SAMPLE SCRAMBLE. IF $CASENUM=1 OR LAG(SAMPLE) NE SAMPLE GPCount=1. IF MISSING( GPCount) GPCount=LAG(GPCount)+1. *ONE OF THE ONLY Places you NEED an EXECUTE *. EXECUTE. SELECT IF GPCount LE 300. FREQ SAMPLE. AGGREGATE OUTFILE * / BREAK SAMPLE / MLIFESAT MEANAGE = MEAN(lifesat age) / PctVHapp=PIN(happy,3,3). John F Hall wrote: > > David > > It was just a thought, and I was only trying to help! > > I once had a similar problem when trying to introduce students to > inferential statistics. I'd have say 24 students and a survey with 1800 > cases for a lab session. Each student was asked to sample n from N with a > different SET SEED starting point. This was in the days of 16 VDU's > connected to a remote Vax mainframe (12 working if you were lucky and > severe > time constraints if we didn't want to get locked in the building, or get > away before the Arsenal match finished up the road) not modern PCs and > distance learning. The idea was to get 3 samples each to yield 72 > statistics (mean lifesat, % happy or whatever) which we then plotted (on > the > chalk-board) hopefully to demonstrate a distribution with an approximately > normal distribution. Sometimes it worked, sometimes it didn't (usually > because sampling 30 from 3000 isn't as stable as 300 from 3000) but the > students learned a lot from the attempt and understood what we were trying > to do, especially that the sampling distribution of the mean was > approximately normal even if the variable itself was not (eg age) > > Now, how about some syntax to do what I need for a new tutorial for the > website? > > Assume data set is BSA89.sav, N is 3000 and I want 100 samples of size > 300: > then to save mean lifesat, mean age and % very happy (code 3 on happy) as > variables in a separate file with 100 cases (one for each sample). > > Have a nice weekend. > > John > > [hidden email] > www.surveyresearch.weebly.com > > > > > > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > David Marso > Sent: 15 July 2011 19:33 > To: [hidden email] > Subject: Re: Syntax for SAMPLE > > It makes *NO* sense at all to use SAMPLE within a DO REPEAT. > Think about that for a moment ;-) > > When you do something like > DO REPEAT X= a b c d / Y = e f g h / Z= ae be cg dh. > COMPUTE Z=X/Y. > END REPEAT. > > what do you end up with? > 4 new variables on *EACH* case... > i.e. DO REPEAT handles TRANSFORMATIONS and applies to each case. > How would SAMPLE fit into a DO REPEAT? > > -- > View this message in context: > > 91568.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Syntax-for-SAMPLE-tp4591268p45 91971.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
In reply to this post by John F Hall
I tested that code but commented it later and forgot the period in the comment.
* Using same ideas as what I posted to Cindy Gregory * COMPUTE SCRAMBLE=UNIFORM(1). In all likelihood the rest of the code failed. Not sure why it would append. Anyhow, go back to my original and don't modify anything other than putting a period in the comment!!! <quote author="John F Hall"> David Tried your syntax, but had to modify it a bit. *Marso sample . * data set 'ql4gb1975.sav' . * variables: var544 (happy), var545 (lifesat), age . compute happy = var544. compute lifesat = var545 . compute caseid = serial . *** OK here we go **. SPSS doesn't like SCRAMBLE and I can't find it anywhere in the Syntax Reference Guide, so I dumped it. <SNIP> Also the variables SAMPLE (all with value 101) CASEID ( 1 to 300) and GPCOUNT (1 to 300) are appended to the existing data set, <SNIP> John
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by E. Bernardo
What you need for each study is a measure of effect size (Y) and its standard error. In this case,
Y = ln(OR) -- were 'ln' = natural log Then let U = ln(upper limit of the 95% CI for the OR) U = Y + 1.96*SE(Y) 1.96*SE(Y) = U-Y SE(Y) = (U-Y)/1.96 Some meta-analysis (in the past, at least) wanted the variance of Y rather than the standard error. If that is the case, just square the standard error. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
I Eins: Once your data are organized as OR & SE(logOR), as you were advised, try the syntax below (rather old, it doesn't use the capability of handling several datasets at the same time). HTH, Marta GG (I will not be back from holidays until July 26). * Data from: Wald et al, BMJ 2002;325:1202 Homocysteine and cardiovascular * disease: evidence on causality from a meta-analysis * (Fig 2, prospective studies on ischemic heart disease)'. DATA LIST LIST/trial (F2) year (A5) study (A11) orr (F8.2) selog (F8.3). BEGIN DATA 1 1997 Evans 0.89 0.120 2 2001 Knekt(ND) 0.97 0.242 3 1994 Alfthan 1.00 0.146 4 2001 Fallon 1.13 0.090 5 1999 Whincup 1.15 0.067 6 1998 Stehouwer 1.19 0.191 7 1998 Folsom 1.21 0.268 8 1999 Ridker 1.24 0.112 9 1998 Wald 1.26 0.064 10 1992 Stampfer 1.29 0.135 11 1999 Kark 1.33 0.093 12 1999 Bots 1.34 0.108 13 1995 Arnesen 1.42 0.160 14 2001 Vollset 1.51 0.121 15 1997 Nygard 1.55 0.163 16 2001 Knekt(D) 1.73 0.235 END DATA. CACHE. EXECUTE. SUMMARIZE /TABLES=year TO selog /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Input data' /MISSING=VARIABLE /CELLS=NONE. ************** FIXED EFFECTS MODEL ***************. MATRIX. PRINT /TITLE ' META-ANALYSIS: SUMMARY RATIO FROM PROCESSED DATA (FIXED-EFFECT MODEL)'. PRINT /TITLE="INVERSE VARIANCE (WOOLF'S METHOD)". GET trial /VAR=study. GET orr /VAR=orr. GET selog /VAR=selog. * General calculations & report *. COMPUTE k=NROW(orr). PRINT k /FORMAT="F8.0" /TITLE="Number of trials analysed (K)". COMPUTE wi=(1/selog)&**2. COMPUTE percwi=100*wi&/MSUM(wi). COMPUTE cilow=EXP(LN(orr)-1.96&*selog). COMPUTE ciup=EXP(LN(orr)+1.96&*selog). PRINT {orr,selog,cilow,ciup,percwi} /FORMAT="F8.2" /RNAMES=trial /CLABELS="OR","SE(LNOR)","Lower","Upper", "Weight%" /TITLE=' 95% CI'. * Summary OR *. COMPUTE num=MSUM(wi&*LN(orr)). COMPUTE den=MSUM(wi). COMPUTE woolforr=EXP(num/den). COMPUTE sewoolf=1/SQRT(den). COMPUTE cilowor=EXP((num/den)-1.96*sewoolf). COMPUTE ciupwor=EXP((num/den)+1.96*sewoolf). PRINT {woolforr,sewoolf,cilowor,ciupwor,100} /FORMAT="F8.2" /RLABELS=" Overall" /CLABELS="OR","SE(LNOR)","Lower","Upper", "Weight%" /TITLE='SUMMARY 95% CI'. COMPUTE chival=wi&*((LN(orr)-LN(woolforr))&**2). COMPUTE het_chi=MSUM(chival). COMPUTE het_sig=1-CHICDF(het_chi,k-1). COMPUTE a_chi=(LN(woolforr)/sewoolf)**2. COMPUTE a_sig=1-CHICDF(a_chi,1). * Report *. PRINT {a_chi,a_sig} /FORMAT="f8.4" /CLABELS="Chi^2","Sig." /TITLE="Association Chi-square statistic (df=1) - H0: No association". PRINT {het_chi,het_sig} /FORMAT="f8.4" /CLABELS="Chi^2","Sig." /TITLE="Cochran Q heterogeneity test (df=K-1) - H0: Homogeneity". DO IF het_chi GT (k-1). - DO IF het_sig GT 0.10. - PRINT /TITLE="WARNING: Q p>0.10, but some heterogeneity exists!". - END IF. - COMPUTE h=SQRT(het_chi/(k-1)). - COMPUTE isqr=100*(het_chi-(k-1))/het_chi. - DO IF het_chi GT k. - COMPUTE eeh=LN(h)/(SQRT(2*het_chi)-SQRT(2*k-3)). - ELSE IF het_chi LE k. - COMPUTE eeh=SQRT((1-(1/(3*(k-2)**2)))/(2*(k-2))). - END IF. - COMPUTE lowh=h*EXP(-1.96*eeh). - COMPUTE upph=h*EXP(1.96*eeh). - COMPUTE lowisqr=100*(lowh**2-1)/(lowh**2). - DO IF lowisqr LT 0. - COMPUTE lowisqr=0. - END IF. - COMPUTE uppisqr=100*(upph**2-1)/(upph**2). - DO IF uppisqr GT 100. - COMPUTE uppisqr=100. - END IF. - PRINT {isqr,lowisqr,uppisqr} /FORMAT="F8.1" /CLABELS='I^2(%)','Low95 CI','Upp95 CI' /TITLE='Heterogeneity statistic: 25%(low), 50%(moderate), 75%(high)'. END IF. * Exporting data for forest-plot *. COMPUTE data1={woolforr,sewoolf}. COMPUTE namevec1={"orr","selog"}. SAVE data1 /OUTFILE='c:\temp\extrarow.sav' /NAMES=namevec1. COMPUTE data2={cilow,ciup,percwi;cilowor,ciupwor,100}. COMPUTE namevec2={"loworr","highorr","wi"}. SAVE data2 /OUTFILE='C:\temp\extracols.sav' /NAMES=namevec2. END MATRIX. * Adding extra statistics & data to current file *. ADD FILES /FILE=* /FILE='C:\temp\extrarow.sav'. IF (MISSING(trial)) study = 'Total' . IF (MISSING(trial)) trial = $casenum . MATCH FILES /FILE=* /FILE='C:\temp\extracols.sav'. EXECUTE. * Forest plot with individual and aggregated OR *. VAR LABEL loworr 'Lower 95%CI' /highorr 'Upper 95%CI' /orr 'OR'. GRAPH /HILO(SIMPLE)=VALUE( highorr loworr orr ) BY study /TITLE='Fixed Effects Model'. This is the code for a random effects model: CACHE. EXECUTE. SUMMARIZE /TABLES=year TO selog /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Input data' /MISSING=VARIABLE /CELLS=NONE. ************* RANDOM EFFECTS MODEL ***************. MATRIX. PRINT /TITLE=" META-ANALYSIS: ODDS-RATIO FROM PROCESSED DATA". PRINT /TITLE=" RANDOM EFFECTS MODEL: DERSIMONIAN-LAIRD'S METHOD". GET trial /VAR=study. GET orr /VAR=orr. GET selog /VAR=selog. * General calculations & report *. COMPUTE k=NROW(orr). PRINT k /FORMAT="F8.0" /TITLE="Number of trials analysed (K)". COMPUTE wi=(1/selog)&**2. COMPUTE cilow=EXP(LN(orr)-1.96&*selog). COMPUTE ciup=EXP(LN(orr)+1.96&*selog). * Individual weights & heterogeneity report *. COMPUTE dp=MSUM(wi&*LN(orr))/MSUM(wi). COMPUTE het_chi=MSUM(wi&*(LN(orr)-dp)&**2). COMPUTE het_sig=1-CHICDF(het_chi,k-1). PRINT /TITLE='Heterogeneity before taking Tau-square into consideration'. print {het_chi,het_sig} /format="F8.4" /clabels="Chi^2","Sig." /title="Cochran Q heterogeneity test (df=K-1) - H0: Homogeneity". DO IF het_chi GT (k-1). - DO IF het_sig GT 0.10. - PRINT /TITLE="WARNING: Q p>0.10, but some heterogeneity exists!". - END IF. - COMPUTE h=SQRT(het_chi/(k-1)). - COMPUTE isqr=100*(het_chi-(k-1))/het_chi. - DO IF het_chi GT k. - COMPUTE eeh=LN(h)/(SQRT(2*het_chi)-SQRT(2*k-3)). - ELSE IF het_chi LE k. - COMPUTE eeh=SQRT((1-(1/(3*(k-2)**2)))/(2*(k-2))). - END IF. - COMPUTE lowh=h*EXP(-1.96*eeh). - COMPUTE upph=h*EXP(1.96*eeh). - COMPUTE lowisqr=100*(lowh**2-1)/(lowh**2). - DO IF lowisqr LT 0. - COMPUTE lowisqr=0. - END IF. - COMPUTE uppisqr=100*(upph**2-1)/(upph**2). - DO IF uppisqr GT 100. - COMPUTE uppisqr=100. - END IF. - PRINT {isqr,lowisqr,uppisqr} /FORMAT="F8.1" /CLABELS='I^2(%)','Low95 CI','Upp95 CI' /TITLE='Heterogeneity statistic: 25%(low), 50%(moderate), 75%(high)'. END IF. COMPUTE tau=(het_chi-(k-1))/(MSUM(wi)-(MSUM(wi&**2))/MSUM(wi)). DO IF tau GT 0. - PRINT tau /FORMAT='F8.3' /TITLE='Tau-square (between trials variance)'. - COMPUTE vartilda=tau+selog&**2. - COMPUTE wi=1/vartilda. ELSE. - PRINT /TITLE='Tau-square=0 (random- & fixed-effect models will yield' +' identical results)'. END IF. COMPUTE percwi=100*wi&/MSUM(wi). PRINT /TITLE="INDIVIDUAL RR & TAU-SQUARE MODIFIED WEIGHTS". * Summary OR *. COMPUTE num=MSUM(wi&*LN(orr)). COMPUTE den=MSUM(wi). COMPUTE delaiorr=EXP(num/den). COMPUTE sedelai=1/SQRT(den). COMPUTE cilowdl=EXP((num/den)-1.96*sedelai). COMPUTE ciupdl=EXP((num/den)+1.96*sedelai). * Reports *. PRINT {orr,selog,cilow,ciup,percwi} /FORMAT="F8.2" /RNAMES=trial /CLABELS="OR","SE(LNOR)","Lower","Upper","Weight%" /TITLE=' 95% CI'. DO IF tau2 GT 0. - PRINT /TITLE='Dersimonian-Laird statistic:'. END IF. PRINT {delaiorr,sedelai,cilowdl,ciupdl,100} /FORMAT="F8.2" /RLABELS=" Overall" /CLABELS="OR","SE(LNOR)","Lower","Upper","Weight%" /TITLE='SUMMARY 95% CI'. COMPUTE a_chi=((LN(delaiorr))/sedelai)**2. COMPUTE a_sig=1-CHICDF(a_chi,1). PRINT {a_chi,a_sig} /FORMAT="F8.4" /CLABELS="Chi^2","Sig." /TITLE="Association Chi-square statistic (df=1) - H0: No association". * Exporting data for forest-plot *. COMPUTE data1={delaiorr,sedelai}. COMPUTE namevec1={"orr","selog"}. SAVE data1 /OUTFILE='c:\temp\extrarow.sav' /NAMES=namevec1. COMPUTE data2={cilow,ciup,percwi;cilowdl,ciupdl,100}. COMPUTE namevec2={"loworr","highorr","wi"}. SAVE data2 /OUTFILE='C:\temp\extracols.sav' /NAMES=namevec2. END MATRIX. * Adding extra statistics & data to current file *. ADD FILES /FILE=* /FILE='C:\temp\extrarow.sav'. IF (MISSING(trial)) study = 'Total' . IF (MISSING(trial)) trial = $casenum . MATCH FILES /FILE=* /FILE='C:\temp\extracols.sav'. EXECUTE. * Forest plot with individual and aggregated OR *. VAR LABEL loworr 'Lower 95%CI' /highorr 'Upper 95%CI' /orr 'OR'. GRAPH /HILO(SIMPLE)=VALUE( highorr loworr orr ) BY study /TITLE='Random Effects Model'. |
I remember doing something like this in Stata, using "metan". This also generates the Forest plot. HTH, Martin Holt Medical Statistician From: Marta García-Granero <[hidden email]> To: [hidden email] Sent: Sunday, 17 July 2011, 8:54 Subject: Re: odds ratio and 95% CI in Meta analysis I Eins: |