Hi everyone
I desperately need help with generating a matched control group through SPSS(16). I have 1269 records of individuals with learning disability. 142 of these have a mental health problem (mhprob=1). The control group needs to be generated from the rest of the cases who do not have a mental health problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a score (ABCTOT) which indicates the ability of the individuals to function independently (expressed as a percentage). I have tried applying the script in one of the answers on this forum: http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-a-matched-control-group-td1086666.html I cannot get it working at all. Please have in mind I am not much of an SPSS expert when it comes down to programming and scripts. Many thanks Ivana |
Administrator
|
"I have tried applying the script in one of the answers on this forum: "
Please help others help you! Which script? There is initial reference to a rather sad piece of code http://www.spsstools.net/Syntax/RandomSampling/findRandomPairsOfCasesWithSameCharacteristics.txt Then Syntax by Albert-Jan Roskam and an SPSS extension "CASECTRL" What have you tried? What errors do you receive? "I can not get it working at all. " Is not that informative.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Ivana
You could try something like this. Create two new data files, one for mhprob = 1 and the other for an equal sized sample of mhprob = 0, then use ADD FILES to generate a third data set with the same no of cases of each group. In syntax it would look something like (untested: temp = temporary selection, so SPSS reverts to original file): Temp . Select if mhprob = 1 . Save out <file1.sav> . Temp . Select if mhprob = 0 . Sample n 142 from 1127 . Save out <file2.sav> . This gives you two files of 142 cases each. (You could also use file > save as)
add files file <file1.sav> /file <file2.sav> . I'm not a statistician, so others may advise leaving your original file as is and using statistical procedures which don't need equal numbers of each group.
John Hall
-----Original Message----- Hi everyone I desperately need help with generating a matched control group through SPSS(16). I have 1269 records of individuals with learning disability. 142 of these have a mental health problem (mhprob=1). The control group needs to be generated from the rest of the cases who do not have a mental health problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a score (ABCTOT) which indicates the ability of the individuals to function independently (expressed as a percentage). I have tried applying the script in one of the answers on this forum: I cannot get it working at all. Please have in mind I am not much of an SPSS expert when it comes down to programming and scripts. Many thanks Ivana -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4271299.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Ivana
Ivana,
So this is the code that you are referring to and will need to use. (Did you understand that the first section of code was used to generate some example data?). You said: I have 1269 records of individuals with learning disability. 142 of these have a mental health problem (mhprob=1). The control group needs to be generated from the rest of the cases who do not have a mental health problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a score (ABCTOT) which indicates the ability of the individuals to function independently (expressed as a percentage). Here's how to convert the code to your instance. But there's some reading in the syntax reference that will be helpful to understand what the commands are doing. * actual code. compute random = rv.uniform(0,1). sort cases by mhprob sex age abctot random. aggr out = * / presorted / break = mhprob sex age abctot / dv1 to dv23=first(dv1 to dv23). formats all (f5). At this point you have pairs of cases (but see note following) that are arranged so that each case in the pair are on separate lines. What you do next depends on what you are going to do analytically. If you are going to do paired t-tests you will need to restructure the data further. But, if you are going to use independent sample t-tests, the data are ready for use. NOTE. Before you do anything further you should carefully examine your data to be sure that every treatment case (mhprob=1) has a match. You should be seriously concerned about that since abctot is a percentage. This is where you are going to have problems. If you are satisfied that every treatment case has an adequate match AND you are doing, for example, paired t-tests, then you need to restructure your data so that case and matched control are on the same record or line. This next part does that. sort cases by sex age abctot mhprob. casestovars / id = sex age abctot / index = mhprob. Execute. From left to right the resulting file will have the match variables, dv1 to dv23 for the controls followed by dv1 to dv23 for the treatment cases. The .0 suffix indicating controls and the .1 suffix indicating treatment cases. I don't know what this does. Get rid of it. begin program. import spss spss.Submit("sample 41 from %s." % spss.GetCaseCount()) end program. exe. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ivana Sent: Wednesday, March 30, 2011 7:29 AM To: [hidden email] Subject: Obtaining a matched control group Hi everyone I desperately need help with generating a matched control group through SPSS(16). I have 1269 records of individuals with learning disability. 142 of these have a mental health problem (mhprob=1). The control group needs to be generated from the rest of the cases who do not have a mental health problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a score (ABCTOT) which indicates the ability of the individuals to function independently (expressed as a percentage). I have tried applying the script in one of the answers on this forum: http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw- a-matched-control-group-td1086666.html I cannot get it working at all. Please have in mind I am not much of an SPSS expert when it comes down to programming and scripts. Many thanks Ivana -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr oup-tp4271299p4271299.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by John F Hall
John,
There was something crucial about "matched" control group... I think we need to wait for the OP to tell us what was tried but didn't work. Basic minimal code will require some SORTS and clever LAGS and TAGS. Likely that exact matches will not be available for all requested attributes, so... need to throw some fuzz into the mix. D
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
Sorry, this is what I tried to modify with not much luck
* seed, needed for reproducability. set rng=mt mtindex= 20090120. * sample data. input program. loop #i=1 to 2000. compute ses = trunc(rv.uniform(0, 5)). compute age = trunc(rv.uniform(18, 45)). compute sex = trunc(rv.uniform(1, 2.9)). compute blah = rv.normal(1, 100). compute bloh = rnd(rv.normal(1, 52)). compute casecontr = trunc(rv.uniform(0,1.9)). end case. end loop. end file. end input program. value labels casecontr 0 'control' 1 'case'. variable label blah 'mysterious outcome var #1' / bloh 'mysterious outcome var #2'. * actual code. compute random = rv.uniform(0,1). sort cases by casecontr sex age ses random. aggr out = * / presorted / break = casecontr sex age ses / blah = first (blah) / bloh = first (bloh). formats all (f5). sort cases by sex age ses. casestovars / id = sex age ses / index = casecontr. begin program. import spss spss.Submit("sample 41 from %s." % spss.GetCaseCount()) end program. exe. |

Apologies, I am working at the warehouse today but I will be checking email periodically. Thanks, Heather |
In reply to this post by Maguin, Eugene
Hurrah! This has worked! Thanks so much. I think I can take it from here.
Many thanks Ivana |
Administrator
|
In reply to this post by Maguin, Eugene
Ivana and Gene,
Something really bothers me about the following: " * actual code. compute random = rv.uniform(0,1). sort cases by mhprob sex age abctot random. aggr out = * / presorted / break = mhprob sex age abctot / dv1 to dv23=first(dv1 to dv23). formats all (f5). " consider a situation where you have multiple cases with the same desired matching profile and mhprob status. The AGGregate will lose the cases associated with data2 and data4. 0 1 20 .5 data1 0 1 20 .5 data2 1 1 20 .5 data3 1 1 20 .5 data4 ------- May need something a bit more complex and bullet proof . --Here's my crack at it. Rather than aggregating I do what I call a LAG and DRAG. Note this hasn't been tested as I don't have SPSS immediately available. If nothing else it should provide some insight into the complexities of the issue. Note the first pass obtains a random exact match on SEX AGE ABCTOT. The second on SEX and AGE.... etc. This idea can be generalized to as many variables as needed. ---- * Making this up on the fly and no way to test without rebooting my box ;-( Logic should suffice, but there might be a mistep, but I believe it will work as is. OR, someone will step up and correct my code. *----------- * First sort files by matching criteria*. COMPUTE SCRAMBLER=UNIFORM(1). COMPUTE PAIREDUP=0. SORT CASES BY SEX AGE ABCTOT (A) SCRAMBLER mhprob (D) . COMPUTE YOKE_ID=$CASENUM. COMPUTE PAIRED=YOKE_ID. * This will place cases with matching age sex abctot next to each other and tag them with a unique ID. * Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *. * Now identify exact matches * . DO IF SEX EQ LAG(SEX) AGE EQ LAG(AGE) AND AND ABCTOT=LAG(ABCTOT) AND mhprob EQ 0 AND LAG(mhprob) EQ 1. COMPUTE PAIRED=LAG(YOKE_ID) . COMPUTE MATE=YOKE_ID . END IF. * we have now something like this *. matchedstuff mhprob yoke_id paired mate xxxxxxxxxxx 1 4 4 . xxxxxxxxxxx 0 5 4 5 SORT CASES BY YOKE_ID (D). * we have now something like this *. matchedstuff mhprob yoke_id paired mate xxxxxxxxxxx 0 5 4 5 xxxxxxxxxxx 1 4 4 . IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE) MATE=LAG(MATE). EXE. * we have now something like this *. matchedstuff mhprob yoke_id paired mate xxxxxxxxxxx 0 5 4 5 xxxxxxxxxxx 1 4 4 5 DO IF NOT(MISSING(MATE)). XSAVE OUTFILE "MATCHED1.SAV". COMPUTE PAIREDUP=1. ELSE. END IF. SELECT IF PAIREDUP=0. MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE . *Every case in MATCHED1.SAV should be yoked to another case. *Active file contains unmatched cases. * Now repeat with relaxed criteria (ie not requiring exactly equal abctot). COMPUTE SCRAMBLER=UNIFORM(1). SORT CASES BY SEX AGE ABCTOT (A) SCRAMBLER mhprob (D) . COMPUTE YOKE_ID=$CASENUM. COMPUTE PAIRED=YOKE_ID. * Now identify matches on AGE and SEX and tag CLOSEST ABCTOT* . DO IF SEX EQ LAG(SEX) AND AGE EQ LAG(AGE) AND mhprob EQ 0 AND LAG(mhprob) EQ 1. COMPUTE PAIRED=LAG(YOKE_ID). COMPUTE MATE=YOKE_ID . END IF. SORT CASES BY YOKE_ID (D). IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE) MATE=LAG(MATE). EXE. DO IF NOT(MISSING(MATE)). XSAVE OUTFILE "MATCHED2.SAV". COMPUTE PAIREDUP=1. ELSE. END IF. SELECT IF PAIREDUP=0. *Matched2.sav contains exact matches on sex and age but possibly inexact on ABCTOT. MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE . * Exercise for reader.... Adapt for relaxed criteria on age ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
David,
I'm glad you pointed out that possibility because I overlooked it in my response. Thank you. Ivana, this is something to check before you do the matching operation and after you do the matching operation. Afterwards, the frequencies of mhprob=1 should match the frequencies of that value before matching. Actually, the place to do the frequencies is after the aggregate and before the casestovars. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Wednesday, March 30, 2011 11:29 AM To: [hidden email] Subject: Re: Obtaining a matched control group Ivana and Gene, Something really bothers me about the following: " * actual code. compute random = rv.uniform(0,1). sort cases by mhprob sex age abctot random. aggr out = * / presorted / break = mhprob sex age abctot / dv1 to dv23=first(dv1 to dv23). formats all (f5). " consider a situation where you have multiple cases with the same desired matching profile and mhprob status. The AGGregate will lose the cases associated with data2 and data4. 0 1 20 .5 data1 0 1 20 .5 data2 1 1 20 .5 data3 1 1 20 .5 data4 ------- May need something a bit more complex and bullet proof . --Here's my crack at it. Rather than aggregating I do what I call a LAG and DRAG. Note this hasn't been tested as I don't have SPSS immediately available. If nothing else it should provide some insight into the complexities of the issue. Note the first pass obtains a random exact match on SEX AGE ABCTOT. The second on SEX and AGE.... etc. This idea can be generalized to as many variables as needed. ---- * Making this up on the fly and no way to test without rebooting my box ;-( Logic should suffice, but there might be a mistep, but I believe it will work as is. OR, someone will step up and correct my code. *----------- * First sort files by matching criteria*. COMPUTE SCRAMBLER=UNIFORM(1). COMPUTE PAIREDUP=0. SORT CASES BY SEX AGE ABCTOT (A) SCRAMBLER mhprob (D) . COMPUTE YOKE_ID=$CASENUM. COMPUTE PAIRED=YOKE_ID. * This will place cases with matching age sex abctot next to each other and tag them with a unique ID. * Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *. * Now identify exact matches * . DO IF SEX EQ LAG(SEX) AGE EQ LAG(AGE) AND AND ABCTOT=LAG(ABCTOT) AND mhprob EQ 0 AND LAG(mhprob) EQ 1. COMPUTE PAIRED=LAG(YOKE_ID) . COMPUTE MATE=YOKE_ID . END IF. * we have now something like this *. matchedstuff mhprob yoke_id paired mate xxxxxxxxxxx 1 4 4 . xxxxxxxxxxx 0 5 4 5 SORT CASES BY YOKE_ID (D). * we have now something like this *. matchedstuff mhprob yoke_id paired mate xxxxxxxxxxx 0 5 4 5 xxxxxxxxxxx 1 4 4 . IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE) MATE=LAG(MATE). EXE. * we have now something like this *. matchedstuff mhprob yoke_id paired mate xxxxxxxxxxx 0 5 4 5 xxxxxxxxxxx 1 4 4 5 DO IF NOT(MISSING(MATE)). XSAVE OUTFILE "MATCHED1.SAV". COMPUTE PAIREDUP=1. ELSE. END IF. SELECT IF PAIREDUP=0. MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE . *Every case in MATCHED1.SAV should be yoked to another case. *Active file contains unmatched cases. * Now repeat with relaxed criteria (ie not requiring exactly equal abctot). COMPUTE SCRAMBLER=UNIFORM(1). SORT CASES BY SEX AGE ABCTOT (A) SCRAMBLER mhprob (D) . COMPUTE YOKE_ID=$CASENUM. COMPUTE PAIRED=YOKE_ID. * Now identify matches on AGE and SEX and tag CLOSEST ABCTOT* . DO IF SEX EQ LAG(SEX) AND AGE EQ LAG(AGE) AND mhprob EQ 0 AND LAG(mhprob) EQ 1. COMPUTE PAIRED=LAG(YOKE_ID). COMPUTE MATE=YOKE_ID . END IF. SORT CASES BY YOKE_ID (D). IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE) MATE=LAG(MATE). EXE. DO IF NOT(MISSING(MATE)). XSAVE OUTFILE "MATCHED2.SAV". COMPUTE PAIREDUP=1. ELSE. END IF. SELECT IF PAIREDUP=0. *Matched2.sav contains exact matches on sex and age but possibly inexact on ABCTOT. MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE . * Exercise for reader.... Adapt for relaxed criteria on age ;-) Gene Maguin wrote: > > Ivana, > > So this is the code that you are referring to and will need to use. (Did > you > understand that the first section of code was used to generate some > example > data?). You said: > > I have 1269 records of individuals with learning disability. 142 > of these have a mental health problem (mhprob=1). The control group needs > to > be generated from the rest of the cases who do not have a mental health > problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and > a > score (ABCTOT) which indicates the ability of the individuals to function > independently (expressed as a percentage). > > Here's how to convert the code to your instance. But there's some reading > in > the syntax reference that will be helpful to understand what the commands > are doing. > > > * actual code. > compute random = rv.uniform(0,1). > sort cases by mhprob sex age abctot random. > aggr out = * > / presorted > / break = mhprob sex age abctot > / dv1 to dv23=first(dv1 to dv23). > formats all (f5). > > At this point you have pairs of cases (but see note following) that are > arranged so that each case in the pair are on separate lines. What you do > next depends on what you are going to do analytically. If you are going to > do paired t-tests you will need to restructure the data further. But, if > you > are going to use independent sample t-tests, the data are ready for use. > NOTE. Before you do anything further you should carefully examine your > data > to be sure that every treatment case (mhprob=1) has a match. You should be > seriously concerned about that since abctot is a percentage. This is where > you are going to have problems. > > If you are satisfied that every treatment case has an adequate match AND > you > are doing, for example, paired t-tests, then you need to restructure your > data so that case and matched control are on the same record or line. This > next part does that. > > sort cases by sex age abctot mhprob. > casestovars / id = sex age abctot / index = mhprob. > Execute. > > From left to right the resulting file will have the match variables, dv1 > to > dv23 for the controls followed by dv1 to dv23 for the treatment cases. The > .0 suffix indicating controls and the .1 suffix indicating treatment > cases. > > > I don't know what this does. Get rid of it. > begin program. > import spss > spss.Submit("sample 41 from %s." % spss.GetCaseCount()) > end program. > exe. > > > Gene Maguin > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Ivana > Sent: Wednesday, March 30, 2011 7:29 AM > To: [hidden email] > Subject: Obtaining a matched control group > > Hi everyone > > I desperately need help with generating a matched control group through > SPSS(16). I have 1269 records of individuals with learning disability. 142 > of these have a mental health problem (mhprob=1). The control group needs > to > be generated from the rest of the cases who do not have a mental health > problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and > a > score (ABCTOT) which indicates the ability of the individuals to function > independently (expressed as a percentage). I have tried applying the > script > in one of the answers on this forum: > > > a-matched-control-group-td1086666.html > > I cannot get it working at all. Please have in mind I am not much of an > SPSS > expert when it comes down to programming and scripts. > > Many thanks > > Ivana > > -- > View this message in context: > > oup-tp4271299p4271299.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr oup-tp4271299p4271701.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by David Marso
Well, I'm glad Gene's code worked for Ivana but it has that fatal flaw
if there are sequences of exact matches. I had a chance to test my code and there were a few typos (AND AND .. does not compute ;-). Here is a revision (just the first main logic). It could probably be made more efficient but I need to get dinner going. Probably could lose an EXE but with the lags going I figured better safe than sorry and I don't have time to fine tune it. HTH, David ** SIMULATION DATA **. input program. loop sex= 1 to 2. loop #=1 to 100. compute age=trunc(uniform(10)). compute abctot = trunc(uniform(10))/10. compute mhprob=1. leave sex. end case. end loop. end loop. loop sex= 1 to 2. loop #=1 to 1000. compute age=trunc(uniform(10)). compute abctot = trunc(uniform(10))/10. compute mhprob=0. leave sex. end case. end loop. end loop. end file. end input program. string datamark(a8). COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)). exe. ** **RUN ONLY ONCE **. COMPUTE YOKE_ID=$CASENUM. COMPUTE PAIRED=YOKE_ID. COMPUTE PAIREDUP=0. **** REPEAT THIS CODE UNTIL ALL EXACT MATCHES HAVE BEEN DONE ***. ** CROSSTABS / TABLES SEX BY AGE BY ABCTOT BY MHPROB / CELLS = COUNT. COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY PAIREDUP SEX AGE ABCTOT (A) SCRAMBLE mhprob (D) . * This will place cases with matching age sex abctot next to each other and tag them with a unique ID. * Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *. * Now identify exact matches * . DO IF SEX EQ LAG(SEX) AND AGE EQ LAG(AGE) AND ABCTOT=LAG(ABCTOT) AND mhprob EQ 0 AND LAG(mhprob) EQ 1. + DO IF (NOT(PAIREDUP)). + COMPUTE PAIRED=LAG(YOKE_ID) . + COMPUTE MATE=YOKE_ID . + COMPUTE MATED=1. + END IF. END IF. SORT CASES BY PAIREDUP (A) PAIRED (D) MATE(D). * we have now something like this *. *matchedstuff mhprob yoke_id paired mate *xxxxxxxxxxx 0 5 4 5 *xxxxxxxxxxx 1 4 4 . *. DO IF PAIRED=LAG(PAIRED) AND MISSING (MATE) AND NOT(PAIREDUP). COMPUTE MATE=LAG(MATE). COMPUTE MATED=1. END IF. EXE. * we have now something like this *. *matchedstuff mhprob yoke_id paired mate *xxxxxxxxxxx 0 5 4 5 *xxxxxxxxxxx 1 4 4 5 *. IF MATED PAIREDUP=1. CROSSTABS TABLES PAIREDUP BY MHPROB. freq pairedup. ** REPEAT UNTIL HAPPY!!! *.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Ivana
Shalom
As David Marso point out what you are looking for is more complicate then what you stated . The assumption that cases and controls are sprad evenly along the file is rarly mete . If you use aggregate to form the match it is possible that some of the groups wont have any control in them or as David Marso point will have more then one cases in them . Here is an example using only age to define the groups . id age case/control 1 11 1 >>> match 12 11 0 >>> match 14 12 1 >>> no match 23 14 1 >>> no match 31 15 1 >>> all most match 7 16 0 >>> all most match 2 16 0 >>> no match 4 17 0 >>> no match here you may wont to match 14 with 7 , 14 with 2 , and 15 with 17 . That kind of match is not passable using aggregate . To solve this kind of matching you can create a ruining sum and add 1 to it when ever a case is met and substrate 1 when the first match control is met . here is a general syntax (not tested ) sort cases by sex age abctot random. numeric match_num run_sum (f4) . leave match_num run_sum . do if case eq 1 . compute run_sum = sum( run_sum,1) . compute match_num = sum(match_num,1) . else if case eq 0 and run_sum gt 0 . compute run_sum = sum(run_sum,-1) . compute is_match= 1. end if . select if case eq 1 or is_match eq 1. This syntax will match the closest control AFTER the case which may or may not be a problem . Hillel Vardi BGU On 30/03/2011 13:29, Ivana wrote: > Hi everyone > > I desperately need help with generating a matched control group through > SPSS(16). I have 1269 records of individuals with learning disability. 142 > of these have a mental health problem (mhprob=1). The control group needs to > be generated from the rest of the cases who do not have a mental health > problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a > score (ABCTOT) which indicates the ability of the individuals to function > independently (expressed as a percentage). I have tried applying the script > in one of the answers on this forum: > > http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-a-matched-control-group-td1086666.html > > I cannot get it working at all. Please have in mind I am not much of an SPSS > expert when it comes down to programming and scripts. > > Many thanks > > Ivana > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4271299.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Dear David
I have tested this and precisely as you listed, it worked beautifully. I am so grateful for your time and effort. My thanks also to many other people who have replied. Best wishes Ivana ___________________________ Dr Ivana Dojcinov, MD MRCPsych
Date: Wed, 30 Mar 2011 13:50:55 -0700 From: [hidden email] To: [hidden email] Subject: Re: Obtaining a matched control group Well, I'm glad Gene's code worked for Ivana but it has that fatal flaw if there are sequences of exact matches. I had a chance to test my code and there were a few typos (AND AND .. does not compute ;-). Here is a revision (just the first main logic). It could probably be made more efficient but I need to get dinner going. Probably could lose an EXE but with the lags going I figured better safe than sorry and I don't have time to fine tune it. HTH, David ** SIMULATION DATA **. input program. loop sex= 1 to 2. loop #=1 to 100. compute age=trunc(uniform(10)). compute abctot = trunc(uniform(10))/10. compute mhprob=1. leave sex. end case. end loop. end loop. loop sex= 1 to 2. loop #=1 to 1000. compute age=trunc(uniform(10)). compute abctot = trunc(uniform(10))/10. compute mhprob=0. leave sex. end case. end loop. end loop. end file. end input program. string datamark(a8). COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)). exe. ** **RUN ONLY ONCE **. COMPUTE YOKE_ID=$CASENUM. COMPUTE PAIRED=YOKE_ID. COMPUTE PAIREDUP=0. **** REPEAT THIS CODE UNTIL ALL EXACT MATCHES HAVE BEEN DONE ***. ** CROSSTABS / TABLES SEX BY AGE BY ABCTOT BY MHPROB / CELLS = COUNT. COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY PAIREDUP SEX AGE ABCTOT (A) SCRAMBLE mhprob (D) . * This will place cases with matching age sex abctot next to each other and tag them with a unique ID. * Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *. * Now identify exact matches * . DO IF SEX EQ LAG(SEX) AND AGE EQ LAG(AGE) AND ABCTOT=LAG(ABCTOT) AND mhprob EQ 0 AND LAG(mhprob) EQ 1. + DO IF (NOT(PAIREDUP)). + COMPUTE PAIRED=LAG(YOKE_ID) . + COMPUTE MATE=YOKE_ID . + COMPUTE MATED=1. + END IF. END IF. SORT CASES BY PAIREDUP (A) PAIRED (D) MATE(D). * we have now something like this *. *matchedstuff mhprob yoke_id paired mate *xxxxxxxxxxx 0 5 4 5 *xxxxxxxxxxx 1 4 4 . *. DO IF PAIRED=LAG(PAIRED) AND MISSING (MATE) AND NOT(PAIREDUP). COMPUTE MATE=LAG(MATE). COMPUTE MATED=1. END IF. EXE. * we have now something like this *. *matchedstuff mhprob yoke_id paired mate *xxxxxxxxxxx 0 5 4 5 *xxxxxxxxxxx 1 4 4 5 *. IF MATED PAIREDUP=1. CROSSTABS TABLES PAIREDUP BY MHPROB. freq pairedup. ** REPEAT UNTIL HAPPY!!! *. If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4272308.html
To unsubscribe from Obtaining a matched control group, click here.
|
In reply to this post by David Marso
Dear David
I've just tested this and it worked beautifully. Thank you ever so much. My thanks also to all other people who have spared time and effort to help me. Kind regards Ivana |
Administrator
|
In reply to this post by Ivana
Hi Ivana,
You are very welcome! I was think on this further after an interesting email from Gene regarding sequences (similar to Hillel Vardi's post last night). I came up with the following tidbit which is much easier than my previous post and has the added feature of being almost completely intuitive. Another nice benefit is it does not require a SORT and in my tests is a KEEPER ;-). COMPUTE ID=$CASENUM. COMPUTE SCRAMBL=UNIFORM(1). RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. IF MHPROB=0 ID0=ID. IF MHPROB=1 ID1=ID. AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). FREQ MATCH. Comments: RANK is able to construct 'counters' BY strata without the relevant cases being contiguous. NICE. After the AGGREGATE the file will have the strata variables (and paired IDs -ID1, ID2-) but not the MHPROB variable. No problem since this information is implied by presence/absence of ID0 and ID1. Taking it further: One could segregate the MATCH cases into a separate file, deleting from working file and then rerun the code after doing a VARSTOCASES (ie restoring ID from ID0 and ID1). In this case I would probably. COMPUTE a random variable and sort on it, then use a variant of the RANK as: RANK ABCTOT BY SEX AGE mhPROB (may need to specify TIES to deal with duplicate values in ABCTOT?). This would build RANKS of ABCTOT within the strata and a later AGGREGATE would group them together as previously (fuzzy match within the ranked values of ABCTOT). NOTE: In contrast to Gene's example I do not spread the data elements, I just store the IDs. To map the data to the IDs will simply require a VARSTOCASES to make the file long -That's all you need to carry- SORT CASES BY ID MATCH FILES into the SORTED detail level file. Hope this helps, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Although it wasn't stated in the original
post, it sounded to me like one of the match variables was continuous and
therefore, exact matches would be unlikely. In that case you would
need a tolerance factor in order to get a match. FUZZY, of course,
handles all of this.
Contrary to what I recalled earlier, FUZZY should work with version 16 (but no earlier one). The clue is the one-word name of the extension command, which is a limitation in V16. Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: David Marso <[hidden email]> To: [hidden email] Date: 03/31/2011 07:56 AM Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail) Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Ivana, You are very welcome! I was think on this further after an interesting email from Gene regarding sequences (similar to Hillel Vardi's post last night). I came up with the following tidbit which is much easier than my previous post and has the added feature of being almost completely intuitive. Another nice benefit is it does not require a SORT and in my tests is a KEEPER ;-). COMPUTE ID=$CASENUM. COMPUTE SCRAMBL=UNIFORM(1). RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. IF MHPROB=0 ID0=ID. IF MHPROB=1 ID1=ID. AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). FREQ MATCH. Comments: RANK is able to construct 'counters' BY strata without the relevant cases being contiguous. NICE. After the AGGREGATE the file will have the strata variables (and paired IDs -ID1, ID2-) but not the MHPROB variable. No problem since this information is implied by presence/absence of ID0 and ID1. Taking it further: One could segregate the MATCH cases into a separate file, deleting from working file and then rerun the code after doing a VARSTOCASES (ie restoring ID from ID0 and ID1). In this case I would probably. COMPUTE a random variable and sort on it, then use a variant of the RANK as: RANK ABCTOT BY SEX AGE mhPROB (may need to specify TIES to deal with duplicate values in ABCTOT?). This would build RANKS of ABCTOT within the strata and a later AGGREGATE would group them together as previously (fuzzy match within the ranked values of ABCTOT). NOTE: In contrast to Gene's example I do not spread the data elements, I just store the IDs. To map the data to the IDs will simply require a VARSTOCASES to make the file long -That's all you need to carry- SORT CASES BY ID MATCH FILES into the SORTED detail level file. Hope this helps, David -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4273397.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Jon,
Very interesting. I didn't know that extension command. From the docstring: h is the current demander case hash case is the current supplier case return is - 0 if no match - 1 if fuzzy match - 2 if exact match Why only these discrete values and not values [0-1]? A better distinction could then be made between different candidate record pairs. Also, I wonder if it isn't a big penalty if a possible match is considered a non-match if one of the linkage vars is missing? This has nothing to do with Fuzzy itself, but is following code fragment used in conjunction with gettext?: #enable localization global _ try: _("---") except: def _(msg): return msg Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Jon K Peck <[hidden email]> To: [hidden email] Sent: Thu, March 31, 2011 4:04:30 PM Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail) Although it wasn't stated in the original post, it sounded to me like one of the match variables was continuous and therefore, exact matches would be unlikely. In that case you would need a tolerance factor in order to get a match. FUZZY, of course, handles all of this. Contrary to what I recalled earlier, FUZZY should work with version 16 (but no earlier one). The clue is the one-word name of the extension command, which is a limitation in V16. Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: David Marso <[hidden email]> To: [hidden email] Date: 03/31/2011 07:56 AM Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail) Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Ivana, You are very welcome! I was think on this further after an interesting email from Gene regarding sequences (similar to Hillel Vardi's post last night). I came up with the following tidbit which is much easier than my previous post and has the added feature of being almost completely intuitive. Another nice benefit is it does not require a SORT and in my tests is a KEEPER ;-). COMPUTE ID=$CASENUM. COMPUTE SCRAMBL=UNIFORM(1). RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. IF MHPROB=0 ID0=ID. IF MHPROB=1 ID1=ID. AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). FREQ MATCH. Comments: RANK is able to construct 'counters' BY strata without the relevant cases being contiguous. NICE. After the AGGREGATE the file will have the strata variables (and paired IDs -ID1, ID2-) but not the MHPROB variable. No problem since this information is implied by presence/absence of ID0 and ID1. Taking it further: One could segregate the MATCH cases into a separate file, deleting from working file and then rerun the code after doing a VARSTOCASES (ie restoring ID from ID0 and ID1). In this case I would probably. COMPUTE a random variable and sort on it, then use a variant of the RANK as: RANK ABCTOT BY SEX AGE mhPROB (may need to specify TIES to deal with duplicate values in ABCTOT?). This would build RANKS of ABCTOT within the strata and a later AGGREGATE would group them together as previously (fuzzy match within the ranked values of ABCTOT). NOTE: In contrast to Gene's example I do not spread the data elements, I just store the IDs. To map the data to the IDs will simply require a VARSTOCASES to make the file long -That's all you need to carry- SORT CASES BY ID MATCH FILES into the SORTED detail level file. Hope this helps, David -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4273397.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: Albert-Jan Roskam <[hidden email]> To: [hidden email] Date: 03/31/2011 01:45 PM Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail) Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Jon, Very interesting. I didn't know that extension command. From the docstring: h is the current demander case hash case is the current supplier case return is - 0 if no match - 1 if fuzzy match - 2 if exact match Why only these discrete values and not values [0-1]? A better distinction could then be made between different candidate record pairs. Also, I wonder if it isn't a big penalty if a possible match is considered a non-match if one of the linkage vars is missing? >>>When I designed this, I felt that a missing value should not be considered as a match with anything - there is no information. If someone wants different behavior, they can change the missing values temporarily. >>>As for the metric, in order to provide a distance for the mismatch, there has to be some metric defined, so the user would have to provide that. Of course, that only applies when not using an exact match. In the case of categorical variables, this could be pretty messy. And the user might well want to weight variables differently. If a user wanted to provide a code fragment that calculated a distance, I could use that, but it would be hard to a user to get it right IMO. The second problem here is that one might then want to minimize the total error in the matches, and that is a large integer programming problem that would require a substantially different approach to matching. Other than the EXACTPRIORITY keyword, FUZZY picks at random from among all cases that satisfy the fuzz criteria. If it picked the best match among the eligible ones, it would be giving priority to cases that are earlier in the file, and this could introduce a subtle bias in the matching behavior if the cases are not in random order (there are some comments about this in the documentation). Even with the current behavior, there is potential for this problem to occur in a milder way, which is why there is a SHUFFLE keyword to combat it, but that increases the time and memory requirements. This has nothing to do with Fuzzy itself, but is following code fragment used in conjunction with gettext?: #enable localization global _ try: _("---") except: def _(msg): return msg >>> I added some automatic setup for translations to the extensions.py module in, IIRC, version 18. Since most of the extension commands also work with V17 and might not have the updated extensions.py module, the code above checks to see whether the _ function is defined and generates an identity function if not. There are some subtleties with _ that are explained in the extension module code. We write all the Python extension commands to be translatable now, even though many are not currently translated. Documentation on how this works is in the extension command doc. Thanks for the comments. Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Jon K Peck <[hidden email]> To: [hidden email] Sent: Thu, March 31, 2011 4:04:30 PM Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail) Although it wasn't stated in the original post, it sounded to me like one of the match variables was continuous and therefore, exact matches would be unlikely. In that case you would need a tolerance factor in order to get a match. FUZZY, of course, handles all of this. Contrary to what I recalled earlier, FUZZY should work with version 16 (but no earlier one). The clue is the one-word name of the extension command, which is a limitation in V16. Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: David Marso <[hidden email]> To: [hidden email] Date: 03/31/2011 07:56 AM Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail) Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Ivana, You are very welcome! I was think on this further after an interesting email from Gene regarding sequences (similar to Hillel Vardi's post last night). I came up with the following tidbit which is much easier than my previous post and has the added feature of being almost completely intuitive. Another nice benefit is it does not require a SORT and in my tests is a KEEPER ;-). COMPUTE ID=$CASENUM. COMPUTE SCRAMBL=UNIFORM(1). RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. IF MHPROB=0 ID0=ID. IF MHPROB=1 ID1=ID. AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). FREQ MATCH. Comments: RANK is able to construct 'counters' BY strata without the relevant cases being contiguous. NICE. After the AGGREGATE the file will have the strata variables (and paired IDs -ID1, ID2-) but not the MHPROB variable. No problem since this information is implied by presence/absence of ID0 and ID1. Taking it further: One could segregate the MATCH cases into a separate file, deleting from working file and then rerun the code after doing a VARSTOCASES (ie restoring ID from ID0 and ID1). In this case I would probably. COMPUTE a random variable and sort on it, then use a variant of the RANK as: RANK ABCTOT BY SEX AGE mhPROB (may need to specify TIES to deal with duplicate values in ABCTOT?). This would build RANKS of ABCTOT within the strata and a later AGGREGATE would group them together as previously (fuzzy match within the ranked values of ABCTOT). NOTE: In contrast to Gene's example I do not spread the data elements, I just store the IDs. To map the data to the IDs will simply require a VARSTOCASES to make the file long -That's all you need to carry- SORT CASES BY ID MATCH FILES into the SORTED detail level file. Hope this helps, David -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4273397.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Shalom
After thinking all other answers I am quit sure that using Aggregate , Lag or Rank will not work . Te reason for that is that the assumption that there will be controls in all the groups is not met in all situations. Here is an example using David Marso program ( i only reduce the number of cases to 8 and controls to 20 ) . input program. loop sex= 1 to 2. loop #=1 to 4. compute age=trunc(uniform(10)). compute abctot = trunc(uniform(10))/10. compute mhprob=1. leave sex. end case. end loop. end loop. loop sex= 1 to 2. loop #=1 to 10. compute age=trunc(uniform(10)). compute abctot = trunc(uniform(10))/10. compute mhprob=0. leave sex. end case. end loop. end loop. end file. end input program. string datamark(a8). COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)). exe. COMPUTE ID=$CASENUM. COMPUTE SCRAMBL=UNIFORM(1). RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. IF MHPROB=0 ID0=ID. IF MHPROB=1 ID1=ID. AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). FREQ MATCH. Hillel Vardi BGU On 31/03/2011 15:52, David Marso wrote: > Hi Ivana, > You are very welcome! > I was think on this further after an interesting email from Gene regarding > sequences (similar to Hillel Vardi's post last night). I came up with the > following tidbit which is much easier than my previous post and has the > added feature of being almost completely intuitive. Another nice benefit is > it does not require a SORT and in my tests is a KEEPER ;-). > > COMPUTE ID=$CASENUM. > COMPUTE SCRAMBL=UNIFORM(1). > RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. > IF MHPROB=0 ID0=ID. > IF MHPROB=1 ID1=ID. > AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). > COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). > FREQ MATCH. > > Comments: > RANK is able to construct 'counters' BY strata without the relevant cases > being contiguous. NICE. > After the AGGREGATE the file will have the strata variables (and paired IDs > -ID1, ID2-) but not the MHPROB variable. No problem since this information > is implied by presence/absence of ID0 and ID1. > > Taking it further: > One could segregate the MATCH cases into a separate file, deleting from > working file and then rerun the code after doing a VARSTOCASES (ie restoring > ID from ID0 and ID1). In this case I would probably. > > COMPUTE a random variable and sort on it, then use a variant of the RANK as: > RANK ABCTOT BY SEX AGE mhPROB (may need to specify TIES to deal with > duplicate values in ABCTOT?). > This would build RANKS of ABCTOT within the strata and a later AGGREGATE > would group them together as previously (fuzzy match within the ranked > values of ABCTOT). > > NOTE: In contrast to Gene's example I do not spread the data elements, I > just store the IDs. To map the data to the IDs will simply require a > VARSTOCASES to make the file long -That's all you need to carry- > SORT CASES BY ID > MATCH FILES into the SORTED detail level file. > Hope this helps, > David > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I really wouldn't expect ANYTHING to work well with those sample sizes
and distributions ;-) My code should be pretty much usable for reasonably large samples. How does Jon's Fuzzy do with this data? On Thu, Mar 31, 2011 at 7:08 PM, hillel vardi <[hidden email]> wrote: > Shalom > > After thinking all other answers I am quit sure that using Aggregate , Lag > or Rank will not work . > Te reason for that is that the assumption that there will be controls in all > the groups is not met in all situations. > Here is an example using David Marso program ( i only reduce the number of > cases to 8 and controls to 20 ) . > > input program. > loop sex= 1 to 2. > loop #=1 to 4. > compute age=trunc(uniform(10)). > compute abctot = trunc(uniform(10))/10. > compute mhprob=1. > leave sex. > end case. > end loop. > end loop. > loop sex= 1 to 2. > loop #=1 to 10. > compute age=trunc(uniform(10)). > compute abctot = trunc(uniform(10))/10. > compute mhprob=0. > leave sex. > end case. > end loop. > end loop. > end file. > end input program. > string datamark(a8). > COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)). > exe. > COMPUTE ID=$CASENUM. > COMPUTE SCRAMBL=UNIFORM(1). > RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. > IF MHPROB=0 ID0=ID. > IF MHPROB=1 ID1=ID. > AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). > COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). > FREQ MATCH. > > Hillel Vardi > BGU > > On 31/03/2011 15:52, David Marso wrote: >> >> Hi Ivana, >> You are very welcome! >> I was think on this further after an interesting email from Gene regarding >> sequences (similar to Hillel Vardi's post last night). I came up with the >> following tidbit which is much easier than my previous post and has the >> added feature of being almost completely intuitive. Another nice benefit >> is >> it does not require a SORT and in my tests is a KEEPER ;-). >> >> COMPUTE ID=$CASENUM. >> COMPUTE SCRAMBL=UNIFORM(1). >> RANK SCRAMBL BY SEX AGE ABCTOT mhPROB. >> IF MHPROB=0 ID0=ID. >> IF MHPROB=1 ID1=ID. >> AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1). >> COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)). >> FREQ MATCH. >> >> Comments: >> RANK is able to construct 'counters' BY strata without the relevant cases >> being contiguous. NICE. >> After the AGGREGATE the file will have the strata variables (and paired >> IDs >> -ID1, ID2-) but not the MHPROB variable. No problem since this >> information >> is implied by presence/absence of ID0 and ID1. >> >> Taking it further: >> One could segregate the MATCH cases into a separate file, deleting from >> working file and then rerun the code after doing a VARSTOCASES (ie >> restoring >> ID from ID0 and ID1). In this case I would probably. >> >> COMPUTE a random variable and sort on it, then use a variant of the RANK >> as: >> RANK ABCTOT BY SEX AGE mhPROB (may need to specify TIES to deal with >> duplicate values in ABCTOT?). >> This would build RANKS of ABCTOT within the strata and a later AGGREGATE >> would group them together as previously (fuzzy match within the ranked >> values of ABCTOT). >> >> NOTE: In contrast to Gene's example I do not spread the data elements, I >> just store the IDs. To map the data to the IDs will simply require a >> VARSTOCASES to make the file long -That's all you need to carry- >> SORT CASES BY ID >> MATCH FILES into the SORTED detail level file. >> Hope this helps, >> David >> >> > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |