SPSSX Discussion

Case Control Matching Fuzzy

Classic

List

Threaded

10 messages Options

maartenmeerkamp

Nov 15, 2012; 9:18am

Case Control Matching Fuzzy

4 posts

Dear SPSS-experts,

I am having difficulties matching and merging two datasets for a case control study.

The goal:
I have two patient database's, one with an intervention, one control. I would like to match the intervention group with the control group. Therefor I have written this syntax:

get file="/Users/xxx/xxx/control.sav".
dataset name supplier.
get file="/Users/xxx/xxx/intervention.sav".
dataset name demander.
fuzzy demanderds=demander supplierds=supplier
by= Age Weight
fuzz= 5 5
supplierid = Casenr
newdemanderidvars=supplierId.

Question 1:
Is it correct that a control patient with an Age +-5yr and Weight +-5kg compared to the intervention patient will be searched? Because in my output 260 (out of a total of 500) Fuzzy matches are found even when I put the Fuzz to 50 and 50. In that case every patient should match with every individual from the other group and i would expect the maximum Fuzzy matches (500)

Question 2:
When I run the syntax in my demander database 2 new variables appear:
- matchgroup
- supplierid
I do not understand what these variables say.

Question 3
How can I merge the intervention with the matched control patients into one database?

Excuse me if I the questions are real easy but I can't seem to find the awnser anywhere...

Thanks a lot!

xenia

Nov 15, 2012; 10:18am

Re: Case Control Matching Fuzzy

68 posts

hi,
could you explain what the fuzz is?

xenia

Nov 15, 2012; 10:23am

Re: Case Control Matching Fuzzy

68 posts

In reply to this post by maartenmeerkamp

Also, what do you mean "is it correct that a control patient.... will be searched"? I assume you want to find controls for your cases, the matching variables are going to be set by you. If you want the matching to be done on age and weight (I don't know what your dependent variable is) then that's fine.

maartenmeerkamp

Nov 15, 2012; 12:24pm

Re: Case Control Matching Fuzzy

4 posts

This is what the SPSS help function told me:
By default, a match is defined by identical values for all the BY variables. A system-missing value prevents a case from being matched. Fuzzy matching is also available for numeric variables. Specify FUZZ=list-of-matching tolerances. There must be one fuzz value for each BY variable, listed in BY-value order.
A tolerance is the maximum difference in either direction that is allowed for a match. Thus, values of 1 and 2 would match if tolerance is 1 or more, and a tolerance of zero means an exact match on that variable. You must use 0 for any string variable.
By default, with fuzzy matching, an exact match is first tried, and then a fuzzy match is tried. There is no attempt to get the closest fuzzy match, just a match within the tolerance.

xenia

Nov 15, 2012; 12:58pm

Re: Case Control Matching Fuzzy

68 posts

I'm not sure you can do case-control matching in spss, others may know of a way.

Which help file is this, i.e. you went to the Help Topics from within SPSS and searched using which keyword?

maartenmeerkamp

Nov 15, 2012; 1:10pm

Re: Case Control Matching Fuzzy

4 posts

Entering the syntax FUZZY /HELP gave me the information

I red you can match in SPSS in this post:
http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14523917

maartenmeerkamp

Nov 15, 2012; 1:30pm

Re: Case Control Matching Fuzzy

4 posts

I found out that supplier id is the case number from the other database which matches with the subsequent case. My main questions now are:

Q1
How do merge the files, so how do only copy the matches to one database and not the whole file?

Q2
More importantly I found that changing the FUZZ to a greater number hardly improves the matched number of cases. Whereas I expect that a greater fuzz gives a greater tolerance so I would expect more cases to match but this or sometimes the opposite is true. Can someone explain me how this works?

Maguin, Eugene

Nov 16, 2012; 8:22pm

Re: Case Control Matching Fuzzy

1973 posts

In reply to this post by maartenmeerkamp

Where does this command come from? Is this new to 21? A piece of python code? An undocumented command? A macro?

fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId.

Thanks, Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of maartenmeerkamp
Sent: Thursday, November 15, 2012 4:19 AM
To: [hidden email]
Subject: Case Control Matching Fuzzy

Dear SPSS-experts,

I am having difficulties matching and merging two datasets for a case control study.

*The goal*:
I have two patient database's, one with an intervention, one control. I would like to match the intervention group with the control group. Therefor I have written this syntax:

get file="/Users/xxx/xxx/control.sav".
dataset name supplier.
get file="/Users/xxx/xxx/intervention.sav".
dataset name demander.
fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId.

*Question 1*:
Is it correct that a control patient with an Age +-5yr and Weight +-5kg compared to the intervention patient will be searched? Because in my output
260 (out of a total of 500) Fuzzy matches are found even when I put the Fuzz to 50 and 50. In that case every patient should match with every individual from the other group and i would expect the maximum Fuzzy matches (500)

*Question 2*:
When I run the syntax in my demander database 2 new variables appear:
- matchgroup
- supplierid
I do not understand what these variables say.

*Question 3*
How can I merge the intervention with the matched control patients into one database?

Excuse me if I the questions are real easy but I can't seem to find the awnser anywhere...

Thanks a lot!

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Case-Control-Matching-Fuzzy-tp5716210.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Nov 16, 2012; 9:38pm

Re: Case Control Matching Fuzzy

Administrator

3809 posts

IIRC it is one of JoNoh's extensions.
--
One could also quite easily roll your own with an ADD FILES and some clever LAGS and LEADS (or SHIFT VALUES). I created something like this about 15 years ago for a client. Had birth date/birth weight/names and other data and in many cases there were slight mismatches in one or the other or all. It was quite a complicated project but worked beautifully in the end. If you wanted to get really clever you could bring the sorted file into MATRIX and build out some sort of DISTANCE function around neighboring cases.

--------------

Maguin, Eugene wrote

Where does this command come from? Is this new to 21? A piece of python code? An undocumented command? A macro?

fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId.

Thanks, Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of maartenmeerkamp
Sent: Thursday, November 15, 2012 4:19 AM
To: [hidden email]
Subject: Case Control Matching Fuzzy

Dear SPSS-experts,

I am having difficulties matching and merging two datasets for a case control study.

*The goal*:
I have two patient database's, one with an intervention, one control. I would like to match the intervention group with the control group. Therefor I have written this syntax:

get file="/Users/xxx/xxx/control.sav".
dataset name supplier.
get file="/Users/xxx/xxx/intervention.sav".
dataset name demander.
fuzzy demanderds=demander supplierds=supplier by= Age Weight fuzz= 5 5 supplierid = Casenr newdemanderidvars=supplierId.

*Question 1*:
Is it correct that a control patient with an Age +-5yr and Weight +-5kg compared to the intervention patient will be searched? Because in my output
260 (out of a total of 500) Fuzzy matches are found even when I put the Fuzz to 50 and 50. In that case every patient should match with every individual from the other group and i would expect the maximum Fuzzy matches (500)

*Question 2*:
When I run the syntax in my demander database 2 new variables appear:
- matchgroup
- supplierid
I do not understand what these variables say.

*Question 3*
How can I merge the intervention with the matched control patients into one database?

Excuse me if I the questions are real easy but I can't seem to find the awnser anywhere...

Thanks a lot!

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Case-Control-Matching-Fuzzy-tp5716210.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
... [show rest of quote]

... [show rest of quote]

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

jdiez@magellanbr.com

Nov 16, 2012; 9:39pm

Automatic reply: Case Control Matching Fuzzy

3 posts

I will be out of the office throught the Thanksgiving break. If it is an emergency, contact Fred Shumate at 225-936-9860.