Dear SPSS-experts,
I am having difficulties matching and merging two datasets for a case control study. The goal: I have two patient database's, one with an intervention, one control. I would like to match the intervention group with the control group. Therefor I have written this syntax: get file="/Users/xxx/xxx/control.sav". dataset name supplier. get file="/Users/xxx/xxx/intervention.sav". dataset name demander. fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId. Question 1: Is it correct that a control patient with an Age +-5yr and Weight +-5kg compared to the intervention patient will be searched? Because in my output 260 (out of a total of 500) Fuzzy matches are found even when I put the Fuzz to 50 and 50. In that case every patient should match with every individual from the other group and i would expect the maximum Fuzzy matches (500) Question 2: When I run the syntax in my demander database 2 new variables appear: - matchgroup - supplierid I do not understand what these variables say. Question 3 How can I merge the intervention with the matched control patients into one database? Excuse me if I the questions are real easy but I can't seem to find the awnser anywhere... Thanks a lot! |
hi,
could you explain what the fuzz is? |
In reply to this post by maartenmeerkamp
Also, what do you mean "is it correct that a control patient.... will be searched"? I assume you want to find controls for your cases, the matching variables are going to be set by you. If you want the matching to be done on age and weight (I don't know what your dependent variable is) then that's fine.
|
This is what the SPSS help function told me:
By default, a match is defined by identical values for all the BY variables. A system-missing value prevents a case from being matched. Fuzzy matching is also available for numeric variables. Specify FUZZ=list-of-matching tolerances. There must be one fuzz value for each BY variable, listed in BY-value order. A tolerance is the maximum difference in either direction that is allowed for a match. Thus, values of 1 and 2 would match if tolerance is 1 or more, and a tolerance of zero means an exact match on that variable. You must use 0 for any string variable. By default, with fuzzy matching, an exact match is first tried, and then a fuzzy match is tried. There is no attempt to get the closest fuzzy match, just a match within the tolerance. |
I'm not sure you can do case-control matching in spss, others may know of a way.
Which help file is this, i.e. you went to the Help Topics from within SPSS and searched using which keyword? |
Entering the syntax FUZZY /HELP gave me the information
I red you can match in SPSS in this post: http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14523917 |
I found out that supplier id is the case number from the other database which matches with the subsequent case. My main questions now are:
Q1 How do merge the files, so how do only copy the matches to one database and not the whole file? Q2 More importantly I found that changing the FUZZ to a greater number hardly improves the matched number of cases. Whereas I expect that a greater fuzz gives a greater tolerance so I would expect more cases to match but this or sometimes the opposite is true. Can someone explain me how this works? |
In reply to this post by maartenmeerkamp
Where does this command come from? Is this new to 21? A piece of python code? An undocumented command? A macro?
fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId. Thanks, Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of maartenmeerkamp Sent: Thursday, November 15, 2012 4:19 AM To: [hidden email] Subject: Case Control Matching Fuzzy Dear SPSS-experts, I am having difficulties matching and merging two datasets for a case control study. *The goal*: I have two patient database's, one with an intervention, one control. I would like to match the intervention group with the control group. Therefor I have written this syntax: get file="/Users/xxx/xxx/control.sav". dataset name supplier. get file="/Users/xxx/xxx/intervention.sav". dataset name demander. fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId. *Question 1*: Is it correct that a control patient with an Age +-5yr and Weight +-5kg compared to the intervention patient will be searched? Because in my output 260 (out of a total of 500) Fuzzy matches are found even when I put the Fuzz to 50 and 50. In that case every patient should match with every individual from the other group and i would expect the maximum Fuzzy matches (500) *Question 2*: When I run the syntax in my demander database 2 new variables appear: - matchgroup - supplierid I do not understand what these variables say. *Question 3* How can I merge the intervention with the matched control patients into one database? Excuse me if I the questions are real easy but I can't seem to find the awnser anywhere... Thanks a lot! -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Case-Control-Matching-Fuzzy-tp5716210.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
IIRC it is one of JoNoh's extensions.
-- One could also quite easily roll your own with an ADD FILES and some clever LAGS and LEADS (or SHIFT VALUES). I created something like this about 15 years ago for a client. Had birth date/birth weight/names and other data and in many cases there were slight mismatches in one or the other or all. It was quite a complicated project but worked beautifully in the end. If you wanted to get really clever you could bring the sorted file into MATRIX and build out some sort of DISTANCE function around neighboring cases. --------------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I will be out of the office throught the Thanksgiving break. If it is an emergency, contact Fred Shumate at 225-936-9860. |
Free forum by Nabble | Edit this page |