Fuzzy code hanging up

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuzzy code hanging up

parisec
Hi

I'm going to need to do a 1:10 matching for a matched case control study and I found this example syntax the GUI produced that someone shared on stackoverflow.

It runs, but it's hanging up and I'm not sure whether it's the code or the fact that my laptop is on the brink of destruction.

Would someone be willing to run this and let me know if works?. thanks

Carol

https://stackoverflow.com/questions/55890764/how-to-fix-case-control-matching-with-spss-fuzzy-command

* Encoding: UTF-8.

DATASET CLOSE ALL.
NEW FILE.
OUTPUT CLOSE ALL.

INPUT PROGRAM.
LOOP supplier = 1 TO 745414.
COMPUTE case =  (mod($CASENUM,4)=0).
COMPUTE age = SUM(TRUNC(UNIFORM(80)),8).
COMPUTE sex = SUM(TRUNC(UNIFORM(2)),1).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
VALUE LABELS case 0 'Control' 1 'Case'/
     sex 1 'Female' 2 'Male'.
EXECUTE.
** A 1:4 ratio *.
FREQUENCIES VARIABLES =case.

&&&&cp - this is the syntax that the original poster provided &&&

** Your posted command -- fails for me **.
*FUZZY BY=age sex supplierid=supplier newdemanderidvar=sid group=case.

&&&cp - this is the syntax that the GUI produced. This is the end of the post so there was no indication of whether or not it ran.

** The command the UI built and pasted **.
FUZZY BY=age sex SUPPLIERID=supplier NEWDEMANDERIDVARS=sid GROUP=case EXACTPRIORITY=FALSE  MATCHGROUPVAR=id
/OPTIONS SAMPLEWITHREPLACEMENT=FALSE MINIMIZEMEMORY=TRUE SHUFFLE=FALSE.
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

Andy W
My personal machine takes about 5 minutes to crunch out this example.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

parisec
But it finally gives you a solution... Maybe i'll create a smaller dataset to try it on. I am just not certain what i'm going to get in the end.

Thank you.

Carol

Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

PRogman
In reply to this post by parisec
FUZZY is an extension command run in Python. The failing example has a spelling error 'newdemanderidvar' which should be 'newdemanderidvars', then it runs on my machine but takes time...

Note that this data generation code selects every 4th line to be case  -- to me that is 1:3, not 1:4 as intended. If you want 10 controls and 1 case (every 11th row) you need to change the Compute Case statement:
COMPUTE case =  (MOD($CASENUM,(10+1))=0).  
There could be some issues with selecting cases this non-random way, but maybe you have solved that in another way, and this is only to create a demo database.

/PR

parisec wrote
Hi

I'm going to need to do a 1:10 matching for a matched case control study and I found this example syntax the GUI produced that someone shared on stackoverflow.

It runs, but it's hanging up and I'm not sure whether it's the code or the fact that my laptop is on the brink of destruction.

Would someone be willing to run this and let me know if works?. thanks

Carol

https://stackoverflow.com/questions/55890764/how-to-fix-case-control-matching-with-spss-fuzzy-command

* Encoding: UTF-8.

DATASET CLOSE ALL.
NEW FILE.
OUTPUT CLOSE ALL.

INPUT PROGRAM.
LOOP supplier = 1 TO 745414.
COMPUTE case =  (mod($CASENUM,4)=0).
COMPUTE age = SUM(TRUNC(UNIFORM(80)),8).
COMPUTE sex = SUM(TRUNC(UNIFORM(2)),1).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
VALUE LABELS case 0 'Control' 1 'Case'/
     sex 1 'Female' 2 'Male'.
EXECUTE.
** A 1:4 ratio *.
FREQUENCIES VARIABLES =case.

&&&&cp - this is the syntax that the original poster provided &&&

** Your posted command -- fails for me **.
*FUZZY BY=age sex supplierid=supplier newdemanderidvar=sid group=case.

&&&cp - this is the syntax that the GUI produced. This is the end of the post so there was no indication of whether or not it ran.

** The command the UI built and pasted **.
FUZZY BY=age sex SUPPLIERID=supplier NEWDEMANDERIDVARS=sid GROUP=case EXACTPRIORITY=FALSE  MATCHGROUPVAR=id
/OPTIONS SAMPLEWITHREPLACEMENT=FALSE MINIMIZEMEMORY=TRUE SHUFFLE=FALSE.
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

parisec
Thanks PRog for pointing out the syntax error!

Knowing i'm going to need to do a 1:10 matched conditional logistic regression, I'm trying to figure out this syntax.  i found that code on stackoverflow and am working from the ground up to see what the syntax actually does to the files.

I modified it to create a database of only 1000 records, 750 controls and 250 cases and it runs smoothly. I added the DS3 to save a file of cases only, but have not yet figured out the value in this file.

FUZZY BY=age sex SUPPLIERID=supplier
               NEWDEMANDERIDVARS=sid1
               GROUP=case EXACTPRIORITY=FALSE
                MATCHGROUPVAR=id
                DS3 = StackCasesOnly
               /OPTIONS SAMPLEWITHREPLACEMENT=FALSE MINIMIZEMEMORY=TRUE SHUFFLE=FALSE.
**DS3 saves a file of only cases with their corresponding control match. But case is recoded to 0. Know it's cases only because only 250 are in the resulting file.


Output
Match Type Count
Exact Matches 235
Fuzzy Matches 0
Unmatched Including Missing Keys 15
Unmatched with Valid Keys 15


ID = the number of match combinations
SID1 = the control SI that is a match



I added the DS3 which saves a file of only the cases, all of the variables and their SID1.



But i cannot figure out how to make either of these files into something usable:

Supplier 4 is a case and matches with Supplier 406, a control, on both age and sex.

Supplier Case Age         Sex       agecat                 id           sid1
192.00 Case        17.00 Female .00 -1,842,871,603.00 278.00
278.00 Control 17.00 Female .00
900.00 Case        17.00 Female .00 -1,842,871,603.00 155.00

What i need to get to is something like this, where a cases is linked to a control by a unique number. MatchID:

Supplier Case Age         Sex       agecat                 id           sid1         MatchID
192.00 Case        17.00 Female .00 -1,842,871,603.00 278.00        1
278.00 Control 17.00 Female .00                                          1
900.00 Case        17.00 Female .00 -1,842,871,603.00 155.00        1
155.00 Control 17.00 Female .00                                          1

This is what I get when I shoot to match 3 controls to a case:

Supplier = 4 only has 1 match
But the other 2 have 3 controls that match

Supplier Case        Age Sex agecat id                sid1   sid2 sid3
4        Case        62 Male    1    -131,133,941.00 406
406        Control 62 Male    1
                                                               
136        Case        84 Male    1 -940,550,103.00 381     5        243
381        Control 84 Male    1
5        Control 84 Male    1
243        Control 84 Male    1
                                                               
8        Case        13 Male    0 513,913,668.00         342   601 463
342        Control 13 Male    0
601        Control 13 Male    0
463        Control 13 Male    0

It seems like the ID is the key to getting this into the file structure i'm looking for but have not quite figured out how.

Open to any ideas!

Thanks
Carol
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

Bruce Weaver
Administrator
This comment line below baffles me:

**DS3 saves a file of only cases with their corresponding control match. But case is recoded to 0. Know it's cases only because only 250 are in the resulting file.

Why in the *world* would the programmer(s) of FUZZY have it recode the case variable in that manner?  It makes absolutely no sense.  I STRONGLY suspect that DS3 is writing a stacked file of CONTROLS who are matched to cases in the original file.  

Here is what the HELP for FUZZY says about DS3:

"If DS3 is specified, a new dataset is created containing the cases in the supplier dataset actually used for the matches. It will be the active dataset after the command is run."

I think that "cases" in that line from the Help should be read as "rows", not cases as in a case-control design.  And surely it is CONTROLS (in the case-control sense) who are used as matches.  

HTH.


parisec wrote
Thanks PRog for pointing out the syntax error!

Knowing i'm going to need to do a 1:10 matched conditional logistic regression, I'm trying to figure out this syntax.  i found that code on stackoverflow and am working from the ground up to see what the syntax actually does to the files.

I modified it to create a database of only 1000 records, 750 controls and 250 cases and it runs smoothly. I added the DS3 to save a file of cases only, but have not yet figured out the value in this file.

FUZZY BY=age sex SUPPLIERID=supplier
               NEWDEMANDERIDVARS=sid1
               GROUP=case EXACTPRIORITY=FALSE
                MATCHGROUPVAR=id
                DS3 = StackCasesOnly
               /OPTIONS SAMPLEWITHREPLACEMENT=FALSE MINIMIZEMEMORY=TRUE SHUFFLE=FALSE.
**DS3 saves a file of only cases with their corresponding control match. But case is recoded to 0. Know it's cases only because only 250 are in the resulting file.

--- snip the rest ---
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

parisec
To clarify, this is my comment, not the programmers. I needed to write this out so that i didn't get confused.

I don't know why those are zeros but in my file, cases = 1 and controls=0 and when i used DS3, i got 250 0s for the "cases".
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

Bruce Weaver
Administrator
Hi Carol.  I did not mean to suggest that the programmer(s) of FUZZY wrote that comment.  I meant to say that I think you got it wrong.  I do not believe that FUZZY recodes your CASE variable in that way.  I think the stacked file created via DS3 generates a file of CONTROLS who have been matched to cases.  

I have to get to a meeting in a few minutes, so do not have time to elaborate on this right now, but I also believe the match groups generated by the code you posted can have multiple cases as well as multiple controls.  You should check that if you were expecting only 1 case in each match group.  

HTH.


parisec wrote
To clarify, this is my comment, not the programmers. I needed to write this out so that i didn't get confused.

I don't know why those are zeros but in my file, cases = 1 and controls=0 and when i used DS3, i got 250 0s for the "cases".
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

parisec
Thanks Bruce...very possible i got it wrong!

I will do some digging.

Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy code hanging up

jkpeck

The number of variables listed as NEWDEMANDERIDVARS determines how many matches are attempted.

I only see one variable listed here.

If you are still having trouble, if you can send me the data or a link to it and your current syntax, I can take a look. (jkpeck@gmail.com)

And, for exploratory purposes, you could get the new STATS PSM command from the Extension Hub, and use that to generate the syntax and result.  It replacing the custom dialog for propensity score matching, which is not a full extension command.  It only allows 1-1 matching, however.