Obtaining a matched control group

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Obtaining a matched control group

Ivana
Hi everyone

I desperately need help with generating a matched control group through SPSS(16). I have 1269 records of individuals with learning disability. 142 of these have a mental health problem (mhprob=1). The control group needs to be generated from the rest of the cases who do not have a mental health problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a score (ABCTOT) which indicates the ability of the individuals to function independently (expressed as a percentage). I have tried applying the script in one of the answers on this forum:  

http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-a-matched-control-group-td1086666.html

I cannot get it working at all. Please have in mind I am not much of an SPSS expert when it comes down to programming and scripts.

Many thanks

Ivana
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

David Marso
Administrator
"I have tried applying the script in one of the answers on this forum:   "
Please help others help you!  Which script?
There is initial reference to a rather sad piece of code
http://www.spsstools.net/Syntax/RandomSampling/findRandomPairsOfCasesWithSameCharacteristics.txt
Then Syntax by Albert-Jan Roskam
and an SPSS extension "CASECTRL"  
What have you tried?  
What errors do you receive?
"I can not get it working at all. "  Is not that informative.





Ivana wrote
Hi everyone

I desperately need help with generating a matched control group through SPSS(16). I have 1269 records of individuals with learning disability. 142 of these have a mental health problem (mhprob=1). The control group needs to be generated from the rest of the cases who do not have a mental health problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a score (ABCTOT) which indicates the ability of the individuals to function independently (expressed as a percentage). I have tried applying the script in one of the answers on this forum:  

http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-a-matched-control-group-td1086666.html

I cannot get it working at all. Please have in mind I am not much of an SPSS expert when it comes down to programming and scripts.

Many thanks

Ivana
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

John F Hall
In reply to this post by Ivana
RE: Obtaining a matched control group

You could try something like this.  Create two new data files, one for mhprob = 1 and the other for an equal sized sample of mhprob = 0, then use ADD FILES to generate a third data set with the same no of cases of each group.  In syntax it would look something like (untested: temp = temporary selection, so SPSS reverts to original file):

Temp .

Select if mhprob = 1 .

Save out <file1.sav> .

Temp .

Select if mhprob = 0 .

Sample n 142 from 1127 .

Save out <file2.sav> .


This gives you two files of 142 cases each.  (You could also use file > save as)

add files file <file1.sav> /file <file2.sav> .

I'm not a statistician, so others may advise leaving your original file as is and using statistical procedures which don't need equal numbers of each group.

John Hall

[hidden email] 

www.surveyresearch.weebly.com 

-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Ivana
Sent: 30 March 2011 13:29
To: [hidden email]
Subject: Obtaining a matched control group

Hi everyone

I desperately need help with generating a matched control group through

SPSS(16). I have 1269 records of individuals with learning disability. 142

of these have a mental health problem (mhprob=1). The control group needs to

be generated from the rest of the cases who do not have a mental health

problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a

score (ABCTOT) which indicates the ability of the individuals to function

independently (expressed as a percentage). I have tried applying the script

in one of the answers on this forum:

http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-a-matched-control-group-td1086666.html

I cannot get it working at all. Please have in mind I am not much of an SPSS

expert when it comes down to programming and scripts.

Many thanks

Ivana

--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4271299.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

Maguin, Eugene
In reply to this post by Ivana
Ivana,

So this is the code that you are referring to and will need to use. (Did you
understand that the first section of code was used to generate some example
data?). You said:

I have 1269 records of individuals with learning disability. 142
of these have a mental health problem (mhprob=1). The control group needs to
be generated from the rest of the cases who do not have a mental health
problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a
score (ABCTOT) which indicates the ability of the individuals to function
independently (expressed as a percentage).

Here's how to convert the code to your instance. But there's some reading in
the syntax reference that will be helpful to understand what the commands
are doing.


* actual code.
compute random = rv.uniform(0,1).
sort cases by mhprob sex age abctot random.
aggr out = *
        / presorted
        / break = mhprob sex age abctot
        / dv1 to dv23=first(dv1 to dv23).
formats all (f5).

At this point you have pairs of cases (but see note following) that are
arranged so that each case in the pair are on separate lines. What you do
next depends on what you are going to do analytically. If you are going to
do paired t-tests you will need to restructure the data further. But, if you
are going to use independent sample t-tests, the data are ready for use.
NOTE. Before you do anything further you should carefully examine your data
to be sure that every treatment case (mhprob=1) has a match. You should be
seriously concerned about that since abctot is a percentage. This is where
you are going to have problems.

If you are satisfied that every treatment case has an adequate match AND you
are doing, for example, paired t-tests, then you need to restructure your
data so that case and matched control are on the same record or line. This
next part does that.

sort cases by sex age abctot mhprob.
casestovars / id = sex age abctot / index = mhprob.
Execute.

From left to right the resulting file will have the match variables, dv1 to
dv23 for the controls followed by dv1 to dv23 for the treatment cases. The
.0 suffix indicating controls and the .1 suffix indicating treatment cases.


I don't know what this does. Get rid of it.
begin program.
import spss
spss.Submit("sample 41 from %s." % spss.GetCaseCount())
end program.
exe.


Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Ivana
Sent: Wednesday, March 30, 2011 7:29 AM
To: [hidden email]
Subject: Obtaining a matched control group

Hi everyone

I desperately need help with generating a matched control group through
SPSS(16). I have 1269 records of individuals with learning disability. 142
of these have a mental health problem (mhprob=1). The control group needs to
be generated from the rest of the cases who do not have a mental health
problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a
score (ABCTOT) which indicates the ability of the individuals to function
independently (expressed as a percentage). I have tried applying the script
in one of the answers on this forum:

http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-
a-matched-control-group-td1086666.html

I cannot get it working at all. Please have in mind I am not much of an SPSS
expert when it comes down to programming and scripts.

Many thanks

Ivana

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr
oup-tp4271299p4271299.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

David Marso
Administrator
In reply to this post by John F Hall
John,
  There was something crucial about "matched" control group...
I think we need to wait for the OP to tell us what was tried but didn't work.
Basic minimal code will require some SORTS and clever LAGS and TAGS.
Likely that exact matches will not be available for all requested attributes, so...
need to throw some fuzz into the mix.
D


John F Hall wrote
You could try something like this.  Create two new data files, one for
mhprob = 1 and the other for an equal sized sample of mhprob = 0, then use
ADD FILES to generate a third data set with the same no of cases of each
group.  In syntax it would look something like (untested: temp = temporary
selection, so SPSS reverts to original file):

Temp .
Select if mhprob = 1 .
Save out <file1.sav> .
Temp .
Select if mhprob = 0 .
Sample n 142 from 1127 .
Save out <file2.sav> .


This gives you two files of 142 cases each.  (You could also use file > save
as)

add files file <file1.sav> /file <file2.sav> .

I'm not a statistician, so others may advise leaving your original file as
is and using statistical procedures which don't need equal numbers of each
group.


John Hall
johnfhall@orange.fr
www.surveyresearch.weebly.com






-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Ivana
Sent: 30 March 2011 13:29
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Obtaining a matched control group

Hi everyone

I desperately need help with generating a matched control group through
SPSS(16). I have 1269 records of individuals with learning disability. 142
of these have a mental health problem (mhprob=1). The control group needs to
be generated from the rest of the cases who do not have a mental health
problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a
score (ABCTOT) which indicates the ability of the individuals to function
independently (expressed as a percentage). I have tried applying the script
in one of the answers on this forum:

http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-
a-matched-control-group-td1086666.html

I cannot get it working at all. Please have in mind I am not much of an SPSS
expert when it comes down to programming and scripts.

Many thanks

Ivana

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr
oup-tp4271299p4271299.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

Ivana
In reply to this post by David Marso
Sorry, this is what I tried to modify with not much luck


* seed, needed for reproducability.
set rng=mt mtindex= 20090120.

* sample data.
input program.
loop #i=1 to 2000.
compute ses = trunc(rv.uniform(0, 5)).
compute age = trunc(rv.uniform(18, 45)).
compute sex = trunc(rv.uniform(1, 2.9)).
compute blah = rv.normal(1, 100).
compute bloh = rnd(rv.normal(1, 52)).
compute casecontr = trunc(rv.uniform(0,1.9)).
end case.
end loop.
end file.
end input program.
value labels casecontr 0 'control' 1 'case'.
variable label blah 'mysterious outcome var #1' / bloh 'mysterious outcome var #2'.

* actual code.
compute random = rv.uniform(0,1).
sort cases by casecontr sex age ses random.
aggr out = *
        / presorted
        / break = casecontr sex age ses
        / blah = first (blah) / bloh = first (bloh).
formats all (f5).
sort cases by sex age ses.
casestovars / id = sex age ses / index = casecontr.
begin program.
import spss
spss.Submit("sample 41 from %s." % spss.GetCaseCount())
end program.
exe.
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Obtaining a matched control group

MacGillivary Heather L


Apologies, I am working at the warehouse today but I will be checking email periodically.

Thanks,

Heather

Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

Ivana
In reply to this post by Maguin, Eugene
Hurrah! This has worked! Thanks so much. I think I can take it from here.

Many thanks

Ivana
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

David Marso
Administrator
In reply to this post by Maguin, Eugene
Ivana and Gene,
Something really bothers me about the following:
"
* actual code.
compute random = rv.uniform(0,1).
sort cases by mhprob sex age abctot random.
aggr out = *
        / presorted
        / break = mhprob sex age abctot
        / dv1 to dv23=first(dv1 to dv23).
formats all (f5).
"
consider a situation where you have multiple cases with the same desired matching profile and mhprob status.  The AGGregate will lose the cases associated with data2 and data4.

0 1 20 .5 data1
0 1 20 .5 data2
1 1 20 .5 data3
1 1 20 .5 data4
-------
May need something a bit more complex and bullet proof .
--Here's my crack at it.
Rather than aggregating I do what I call a LAG and DRAG.
Note this hasn't been tested as I don't have SPSS immediately available.
If nothing else it should provide some insight into the complexities of the issue.  Note the first pass obtains a random exact match on SEX AGE ABCTOT.  The second on SEX and AGE.... etc.
This idea can be generalized to as many variables as needed.
----
* Making this up on the fly and no way to test without rebooting my box ;-(  
Logic should suffice, but there might be a mistep, but I believe it will work as is.  OR, someone will step up and correct my code.

*-----------
* First sort files by matching criteria*.
COMPUTE SCRAMBLER=UNIFORM(1).
COMPUTE PAIREDUP=0.
SORT CASES BY  SEX AGE ABCTOT (A)  SCRAMBLER   mhprob (D) .
COMPUTE YOKE_ID=$CASENUM.
COMPUTE PAIRED=YOKE_ID.
* This will place cases with matching age sex abctot next to each other and tag them with a unique ID.
* Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *.
* Now identify exact matches * .
DO IF SEX EQ LAG(SEX)  AGE EQ LAG(AGE)  AND  AND ABCTOT=LAG(ABCTOT)  AND mhprob EQ 0 AND LAG(mhprob) EQ 1.
COMPUTE PAIRED=LAG(YOKE_ID)  .
COMPUTE MATE=YOKE_ID .
END IF.
* we have now something like this *.
matchedstuff mhprob yoke_id paired  mate
xxxxxxxxxxx       1            4         4            .
xxxxxxxxxxx       0            5         4            5

SORT CASES BY YOKE_ID (D).
* we have now something like this *.
matchedstuff mhprob yoke_id paired  mate
xxxxxxxxxxx       0       5         4          5
xxxxxxxxxxx       1       4         4          .

IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE)  MATE=LAG(MATE).
EXE.
* we have now something like this *.
matchedstuff mhprob yoke_id paired  mate
xxxxxxxxxxx       0       5         4          5
xxxxxxxxxxx       1       4         4          5

DO IF NOT(MISSING(MATE)).
XSAVE OUTFILE "MATCHED1.SAV".
COMPUTE PAIREDUP=1.
ELSE.

END IF.

SELECT IF PAIREDUP=0.
MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE .
*Every case in MATCHED1.SAV should  be yoked to another case.
*Active file contains unmatched cases.



* Now repeat with relaxed criteria (ie not requiring exactly equal abctot).

COMPUTE SCRAMBLER=UNIFORM(1).
SORT CASES BY  SEX AGE ABCTOT (A)  SCRAMBLER   mhprob (D) .
COMPUTE YOKE_ID=$CASENUM.
COMPUTE PAIRED=YOKE_ID.
* Now identify matches on AGE and SEX and tag CLOSEST ABCTOT* .
DO IF SEX EQ LAG(SEX)  AND AGE EQ LAG(AGE)  AND mhprob EQ 0 AND LAG(mhprob) EQ 1.
COMPUTE PAIRED=LAG(YOKE_ID).
COMPUTE MATE=YOKE_ID .
END IF.

SORT CASES BY YOKE_ID (D).
IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE)  MATE=LAG(MATE).
EXE.

DO IF NOT(MISSING(MATE)).
XSAVE OUTFILE "MATCHED2.SAV".
COMPUTE PAIREDUP=1.
ELSE.
END IF.
SELECT IF PAIREDUP=0.
*Matched2.sav contains exact matches on sex and age but possibly inexact on ABCTOT.

MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE .

* Exercise for reader.... Adapt for relaxed criteria on age ;-)




Gene Maguin wrote
Ivana,

So this is the code that you are referring to and will need to use. (Did you
understand that the first section of code was used to generate some example
data?). You said:

I have 1269 records of individuals with learning disability. 142
of these have a mental health problem (mhprob=1). The control group needs to
be generated from the rest of the cases who do not have a mental health
problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a
score (ABCTOT) which indicates the ability of the individuals to function
independently (expressed as a percentage).

Here's how to convert the code to your instance. But there's some reading in
the syntax reference that will be helpful to understand what the commands
are doing.


* actual code.
compute random = rv.uniform(0,1).
sort cases by mhprob sex age abctot random.
aggr out = *
        / presorted
        / break = mhprob sex age abctot
        / dv1 to dv23=first(dv1 to dv23).
formats all (f5).

At this point you have pairs of cases (but see note following) that are
arranged so that each case in the pair are on separate lines. What you do
next depends on what you are going to do analytically. If you are going to
do paired t-tests you will need to restructure the data further. But, if you
are going to use independent sample t-tests, the data are ready for use.
NOTE. Before you do anything further you should carefully examine your data
to be sure that every treatment case (mhprob=1) has a match. You should be
seriously concerned about that since abctot is a percentage. This is where
you are going to have problems.

If you are satisfied that every treatment case has an adequate match AND you
are doing, for example, paired t-tests, then you need to restructure your
data so that case and matched control are on the same record or line. This
next part does that.

sort cases by sex age abctot mhprob.
casestovars / id = sex age abctot / index = mhprob.
Execute.

From left to right the resulting file will have the match variables, dv1 to
dv23 for the controls followed by dv1 to dv23 for the treatment cases. The
.0 suffix indicating controls and the .1 suffix indicating treatment cases.


I don't know what this does. Get rid of it.
begin program.
import spss
spss.Submit("sample 41 from %s." % spss.GetCaseCount())
end program.
exe.


Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Ivana
Sent: Wednesday, March 30, 2011 7:29 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Obtaining a matched control group

Hi everyone

I desperately need help with generating a matched control group through
SPSS(16). I have 1269 records of individuals with learning disability. 142
of these have a mental health problem (mhprob=1). The control group needs to
be generated from the rest of the cases who do not have a mental health
problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a
score (ABCTOT) which indicates the ability of the individuals to function
independently (expressed as a percentage). I have tried applying the script
in one of the answers on this forum:

http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-
a-matched-control-group-td1086666.html

I cannot get it working at all. Please have in mind I am not much of an SPSS
expert when it comes down to programming and scripts.

Many thanks

Ivana

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr
oup-tp4271299p4271299.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

Maguin, Eugene
David,

I'm glad you pointed out that possibility because I overlooked it in my
response. Thank you.

Ivana, this is something to check before you do the matching operation and
after you do the matching operation. Afterwards, the frequencies of mhprob=1
should match the frequencies of that value before matching. Actually, the
place to do the frequencies is after the aggregate and before the
casestovars.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Wednesday, March 30, 2011 11:29 AM
To: [hidden email]
Subject: Re: Obtaining a matched control group

Ivana and Gene,
Something really bothers me about the following:
"
* actual code.
compute random = rv.uniform(0,1).
sort cases by mhprob sex age abctot random.
aggr out = *
        / presorted
        / break = mhprob sex age abctot
        / dv1 to dv23=first(dv1 to dv23).
formats all (f5).
"
consider a situation where you have multiple cases with the same desired
matching profile and mhprob status.  The AGGregate will lose the cases
associated with data2 and data4.

0 1 20 .5 data1
0 1 20 .5 data2
1 1 20 .5 data3
1 1 20 .5 data4
-------
May need something a bit more complex and bullet proof .
--Here's my crack at it.
Rather than aggregating I do what I call a LAG and DRAG.
Note this hasn't been tested as I don't have SPSS immediately available.
If nothing else it should provide some insight into the complexities of the
issue.  Note the first pass obtains a random exact match on SEX AGE ABCTOT.
The second on SEX and AGE.... etc.
This idea can be generalized to as many variables as needed.
----
* Making this up on the fly and no way to test without rebooting my box ;-(
Logic should suffice, but there might be a mistep, but I believe it will
work as is.  OR, someone will step up and correct my code.

*-----------
* First sort files by matching criteria*.
COMPUTE SCRAMBLER=UNIFORM(1).
COMPUTE PAIREDUP=0.
SORT CASES BY  SEX AGE ABCTOT (A)  SCRAMBLER   mhprob (D) .
COMPUTE YOKE_ID=$CASENUM.
COMPUTE PAIRED=YOKE_ID.
* This will place cases with matching age sex abctot next to each other and
tag them with a unique ID.
* Those with mhprob 0/1 randomly occurring within blocks of "matched cases"
*.
* Now identify exact matches * .
DO IF SEX EQ LAG(SEX)  AGE EQ LAG(AGE)  AND  AND ABCTOT=LAG(ABCTOT)  AND
mhprob EQ 0 AND LAG(mhprob) EQ 1.
COMPUTE PAIRED=LAG(YOKE_ID)  .
COMPUTE MATE=YOKE_ID .
END IF.
* we have now something like this *.
matchedstuff mhprob yoke_id paired  mate
xxxxxxxxxxx       1            4         4            .
xxxxxxxxxxx       0            5         4            5

SORT CASES BY YOKE_ID (D).
* we have now something like this *.
matchedstuff mhprob yoke_id paired  mate
xxxxxxxxxxx       0       5         4          5
xxxxxxxxxxx       1       4         4          .

IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE)  MATE=LAG(MATE).
EXE.
* we have now something like this *.
matchedstuff mhprob yoke_id paired  mate
xxxxxxxxxxx       0       5         4          5
xxxxxxxxxxx       1       4         4          5

DO IF NOT(MISSING(MATE)).
XSAVE OUTFILE "MATCHED1.SAV".
COMPUTE PAIREDUP=1.
ELSE.

END IF.

SELECT IF PAIREDUP=0.
MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE .
*Every case in MATCHED1.SAV should  be yoked to another case.
*Active file contains unmatched cases.



* Now repeat with relaxed criteria (ie not requiring exactly equal abctot).

COMPUTE SCRAMBLER=UNIFORM(1).
SORT CASES BY  SEX AGE ABCTOT (A)  SCRAMBLER   mhprob (D) .
COMPUTE YOKE_ID=$CASENUM.
COMPUTE PAIRED=YOKE_ID.
* Now identify matches on AGE and SEX and tag CLOSEST ABCTOT* .
DO IF SEX EQ LAG(SEX)  AND AGE EQ LAG(AGE)  AND mhprob EQ 0 AND LAG(mhprob)
EQ 1.
COMPUTE PAIRED=LAG(YOKE_ID).
COMPUTE MATE=YOKE_ID .
END IF.

SORT CASES BY YOKE_ID (D).
IF NOT (MISSING(LAG(MATE))) AND MISSING (MATE)  MATE=LAG(MATE).
EXE.

DO IF NOT(MISSING(MATE)).
XSAVE OUTFILE "MATCHED2.SAV".
COMPUTE PAIREDUP=1.
ELSE.
END IF.
SELECT IF PAIREDUP=0.
*Matched2.sav contains exact matches on sex and age but possibly inexact on
ABCTOT.

MATCH FILES / FILE * / DROP SCRAMBLER PAIRED MATE .

* Exercise for reader.... Adapt for relaxed criteria on age ;-)





Gene Maguin wrote:

>
> Ivana,
>
> So this is the code that you are referring to and will need to use. (Did
> you
> understand that the first section of code was used to generate some
> example
> data?). You said:
>
> I have 1269 records of individuals with learning disability. 142
> of these have a mental health problem (mhprob=1). The control group needs
> to
> be generated from the rest of the cases who do not have a mental health
> problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and
> a
> score (ABCTOT) which indicates the ability of the individuals to function
> independently (expressed as a percentage).
>
> Here's how to convert the code to your instance. But there's some reading
> in
> the syntax reference that will be helpful to understand what the commands
> are doing.
>
>
> * actual code.
> compute random = rv.uniform(0,1).
> sort cases by mhprob sex age abctot random.
> aggr out = *
>         / presorted
>         / break = mhprob sex age abctot
>         / dv1 to dv23=first(dv1 to dv23).
> formats all (f5).
>
> At this point you have pairs of cases (but see note following) that are
> arranged so that each case in the pair are on separate lines. What you do
> next depends on what you are going to do analytically. If you are going to
> do paired t-tests you will need to restructure the data further. But, if
> you
> are going to use independent sample t-tests, the data are ready for use.
> NOTE. Before you do anything further you should carefully examine your
> data
> to be sure that every treatment case (mhprob=1) has a match. You should be
> seriously concerned about that since abctot is a percentage. This is where
> you are going to have problems.
>
> If you are satisfied that every treatment case has an adequate match AND
> you
> are doing, for example, paired t-tests, then you need to restructure your
> data so that case and matched control are on the same record or line. This
> next part does that.
>
> sort cases by sex age abctot mhprob.
> casestovars / id = sex age abctot / index = mhprob.
> Execute.
>
> From left to right the resulting file will have the match variables, dv1
> to
> dv23 for the controls followed by dv1 to dv23 for the treatment cases. The
> .0 suffix indicating controls and the .1 suffix indicating treatment
> cases.
>
>
> I don't know what this does. Get rid of it.
> begin program.
> import spss
> spss.Submit(&quot;sample 41 from %s.&quot; % spss.GetCaseCount())
> end program.
> exe.
>
>
> Gene Maguin
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Ivana
> Sent: Wednesday, March 30, 2011 7:29 AM
> To: [hidden email]
> Subject: Obtaining a matched control group
>
> Hi everyone
>
> I desperately need help with generating a matched control group through
> SPSS(16). I have 1269 records of individuals with learning disability. 142
> of these have a mental health problem (mhprob=1). The control group needs
> to
> be generated from the rest of the cases who do not have a mental health
> problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and
> a
> score (ABCTOT) which indicates the ability of the individuals to function
> independently (expressed as a percentage). I have tried applying the
> script
> in one of the answers on this forum:
>
>
http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-

> a-matched-control-group-td1086666.html
>
> I cannot get it working at all. Please have in mind I am not much of an
> SPSS
> expert when it comes down to programming and scripts.
>
> Many thanks
>
> Ivana
>
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr

> oup-tp4271299p4271299.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-gr
oup-tp4271299p4271701.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

David Marso
Administrator
In reply to this post by David Marso
Well, I'm glad Gene's code worked for Ivana but it has that fatal flaw
if there are sequences of exact matches.
I had a chance to test my code and there were a few typos (AND AND .. does not compute ;-).
Here is a revision (just the first main logic).  
It could probably be made more efficient but I need to get dinner going.
Probably could lose an EXE but with the lags going I figured better safe than sorry and I don't have time to fine tune it.
HTH, David

** SIMULATION DATA **.
input program.
loop sex= 1 to 2.
loop #=1 to 100.
compute age=trunc(uniform(10)).
compute abctot = trunc(uniform(10))/10.
compute mhprob=1.
leave sex.
end case.
end loop.
end loop.
loop sex= 1 to 2.
loop #=1 to 1000.
compute age=trunc(uniform(10)).
compute abctot = trunc(uniform(10))/10.
compute mhprob=0.
leave sex.
end case.
end loop.
end loop.
end file.
end input program.
string datamark(a8).
COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)).
exe.

**
**RUN ONLY ONCE **.
COMPUTE YOKE_ID=$CASENUM.
COMPUTE PAIRED=YOKE_ID.
COMPUTE PAIREDUP=0.

**** REPEAT THIS CODE UNTIL ALL EXACT MATCHES HAVE BEEN DONE ***.
** CROSSTABS / TABLES SEX BY AGE BY ABCTOT BY MHPROB / CELLS = COUNT.
COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY PAIREDUP SEX AGE ABCTOT (A)  SCRAMBLE   mhprob (D) .
* This will place cases with matching age sex abctot next to each other and tag them with a unique ID.
* Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *.
* Now identify exact matches * .

DO IF SEX EQ LAG(SEX) AND AGE EQ LAG(AGE)  AND ABCTOT=LAG(ABCTOT)  AND mhprob EQ 0 AND LAG(mhprob) EQ 1.
+  DO IF (NOT(PAIREDUP)).
+    COMPUTE PAIRED=LAG(YOKE_ID)  .
+    COMPUTE MATE=YOKE_ID .
+    COMPUTE MATED=1.
+  END IF.
END IF.

SORT CASES BY PAIREDUP (A) PAIRED (D) MATE(D).

* we have now something like this *.
*matchedstuff mhprob yoke_id paired  mate
*xxxxxxxxxxx       0       5         4          5
*xxxxxxxxxxx       1       4         4          .
*.

DO IF PAIRED=LAG(PAIRED) AND MISSING (MATE) AND NOT(PAIREDUP).
COMPUTE MATE=LAG(MATE).
COMPUTE MATED=1.
END IF.
EXE.

* we have now something like this *.
*matchedstuff mhprob yoke_id paired  mate
*xxxxxxxxxxx       0       5         4          5
*xxxxxxxxxxx       1       4         4          5
*.
IF MATED PAIREDUP=1.
CROSSTABS TABLES PAIREDUP BY MHPROB.
freq pairedup.
** REPEAT UNTIL HAPPY!!! *.


Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

hillel vardi
In reply to this post by Ivana
  Shalom
  As David Marso point out  what you are looking for is more complicate
then what you stated .
The assumption that cases and controls are sprad evenly along the file
is rarly mete .
If you use aggregate to form the match it is possible that some  of the
groups wont have any control in them or as David Marso point will have
more then one cases in them  .
Here is an example using only age to define the groups  .

id  age   case/control
    1 11      1 >>> match
  12 11      0 >>> match
  14  12     1 >>> no match
   23 14     1 >>> no match
   31 15     1 >>> all most match
     7  16     0 >>> all most match
     2  16     0 >>> no match
     4  17     0 >>> no match
here you may wont   to match 14 with 7 , 14 with 2 , and 15 with 17  .
That kind of  match is not passable  using aggregate .

To solve this kind of matching you can create a ruining sum and add 1 to
it when ever a case is met and
substrate 1 when the first match control is met .

here is a general syntax (not tested )

sort cases by sex age abctot random.
numeric  match_num  run_sum  (f4) .
leave  match_num  run_sum .
do if   case eq 1 .
  compute  run_sum = sum( run_sum,1) .
  compute   match_num = sum(match_num,1) .
else if case eq 0  and  run_sum gt 0 .
  compute  run_sum  = sum(run_sum,-1) .
  compute  is_match= 1.
end if .
select if case eq 1 or is_match   eq 1.


This syntax will match the closest control AFTER  the case which may  or
may not be a problem .


Hillel Vardi
BGU


On 30/03/2011 13:29, Ivana wrote:

> Hi everyone
>
> I desperately need help with generating a matched control group through
> SPSS(16). I have 1269 records of individuals with learning disability. 142
> of these have a mental health problem (mhprob=1). The control group needs to
> be generated from the rest of the cases who do not have a mental health
> problem (mhprob=0) on 1:1 basis. The matching parameters are age, sex and a
> score (ABCTOT) which indicates the ability of the individuals to function
> independently (expressed as a percentage). I have tried applying the script
> in one of the answers on this forum:
>
> http://spssx-discussion.1045642.n5.nabble.com/Sampling-question-How-to-draw-a-matched-control-group-td1086666.html
>
> I cannot get it working at all. Please have in mind I am not much of an SPSS
> expert when it comes down to programming and scripts.
>
> Many thanks
>
> Ivana
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4271299.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

RE: Obtaining a matched control group

Ivana
In reply to this post by David Marso
Dear David

I have tested this and precisely as you listed, it worked beautifully. I am so grateful for your time and effort. My thanks also to many other people who have replied.

Best wishes

 Ivana

___________________________
Dr Ivana Dojcinov, MD MRCPsych


Date: Wed, 30 Mar 2011 13:50:55 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Obtaining a matched control group

Well, I'm glad Gene's code worked for Ivana but it has that fatal flaw
if there are sequences of exact matches.
I had a chance to test my code and there were a few typos (AND AND .. does not compute ;-).
Here is a revision (just the first main logic).  
It could probably be made more efficient but I need to get dinner going.
Probably could lose an EXE but with the lags going I figured better safe than sorry and I don't have time to fine tune it.
HTH, David

** SIMULATION DATA **.
input program.
loop sex= 1 to 2.
loop #=1 to 100.
compute age=trunc(uniform(10)).
compute abctot = trunc(uniform(10))/10.
compute mhprob=1.
leave sex.
end case.
end loop.
end loop.
loop sex= 1 to 2.
loop #=1 to 1000.
compute age=trunc(uniform(10)).
compute abctot = trunc(uniform(10))/10.
compute mhprob=0.
leave sex.
end case.
end loop.
end loop.
end file.
end input program.
string datamark(a8).
COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)).
exe.

**
**RUN ONLY ONCE **.
COMPUTE YOKE_ID=$CASENUM.
COMPUTE PAIRED=YOKE_ID.
COMPUTE PAIREDUP=0.

**** REPEAT THIS CODE UNTIL ALL EXACT MATCHES HAVE BEEN DONE ***.
** CROSSTABS / TABLES SEX BY AGE BY ABCTOT BY MHPROB / CELLS = COUNT.
COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY PAIREDUP SEX AGE ABCTOT (A)  SCRAMBLE   mhprob (D) .
* This will place cases with matching age sex abctot next to each other and tag them with a unique ID.
* Those with mhprob 0/1 randomly occurring within blocks of "matched cases" *.
* Now identify exact matches * .

DO IF SEX EQ LAG(SEX) AND AGE EQ LAG(AGE)  AND ABCTOT=LAG(ABCTOT)  AND mhprob EQ 0 AND LAG(mhprob) EQ 1.
+  DO IF (NOT(PAIREDUP)).
+    COMPUTE PAIRED=LAG(YOKE_ID)  .
+    COMPUTE MATE=YOKE_ID .
+    COMPUTE MATED=1.
+  END IF.
END IF.

SORT CASES BY PAIREDUP (A) PAIRED (D) MATE(D).

* we have now something like this *.
*matchedstuff mhprob yoke_id paired  mate
*xxxxxxxxxxx       0       5         4          5
*xxxxxxxxxxx       1       4         4          .
*.

DO IF PAIRED=LAG(PAIRED) AND MISSING (MATE) AND NOT(PAIREDUP).
COMPUTE MATE=LAG(MATE).
COMPUTE MATED=1.
END IF.
EXE.

* we have now something like this *.
*matchedstuff mhprob yoke_id paired  mate
*xxxxxxxxxxx       0       5         4          5
*xxxxxxxxxxx       1       4         4          5
*.
IF MATED PAIREDUP=1.
CROSSTABS TABLES PAIREDUP BY MHPROB.
freq pairedup.
** REPEAT UNTIL HAPPY!!! *.





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4272308.html
To unsubscribe from Obtaining a matched control group, click here.
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group

Ivana
In reply to this post by David Marso
Dear David

I've just tested this and it worked beautifully. Thank you ever so much. My thanks also to all other people who have spared time and effort to help me.

Kind regards

Ivana
Reply | Threaded
Open this post in threaded view
|

RE: Obtaining a matched control group (A final Nail)

David Marso
Administrator
In reply to this post by Ivana
Hi Ivana,
You are very welcome!
I was think on this further after an interesting email from Gene regarding sequences (similar to Hillel Vardi's post last night).  I came up with the following tidbit which is much easier than my previous post and has the added feature of being almost completely intuitive.  Another nice benefit is it does not require a SORT and in my tests is a KEEPER ;-).

COMPUTE ID=$CASENUM.
COMPUTE SCRAMBL=UNIFORM(1).
RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
IF MHPROB=0 ID0=ID.
IF MHPROB=1 ID1=ID.
AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
FREQ MATCH.

Comments:  
RANK is able to construct 'counters' BY strata without the relevant cases being contiguous.  NICE.
After the AGGREGATE the file will have the strata variables (and paired IDs -ID1, ID2-) but not the MHPROB variable.  No problem since this information is implied by presence/absence of ID0 and ID1.

Taking it further:
One could segregate the MATCH cases into a separate file, deleting from working file and then rerun the code after doing a VARSTOCASES (ie restoring ID from ID0 and ID1).  In this case I would probably.

COMPUTE a random variable and sort on it, then use a variant of the RANK as:
RANK ABCTOT  BY SEX AGE  mhPROB  (may need to specify TIES to deal with duplicate values in ABCTOT?).
This would build RANKS of ABCTOT within the strata and a later AGGREGATE would group them together as previously (fuzzy match within the ranked values of ABCTOT).

NOTE:  In contrast to Gene's example I do not spread the data elements, I just store the IDs.  To map the data to the IDs will simply require a VARSTOCASES to make the file long -That's all you need to carry-
SORT CASES BY ID
MATCH FILES into the SORTED detail level file.
Hope this helps,
David

 






Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group (A final Nail)

Jon K Peck
Although it wasn't stated in the original post, it sounded to me like one of the match variables was continuous and therefore, exact matches would be unlikely.  In that case you would need a tolerance factor in order to get a match.  FUZZY, of course, handles all of this.

Contrary to what I recalled earlier, FUZZY should work with version 16 (but no earlier one).  The clue is the one-word name of the extension command, which is a limitation in V16.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        03/31/2011 07:56 AM
Subject:        Re: [SPSSX-L] Obtaining a matched control group (A final Nail)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Ivana,
You are very welcome!
I was think on this further after an interesting email from Gene regarding
sequences (similar to Hillel Vardi's post last night).  I came up with the
following tidbit which is much easier than my previous post and has the
added feature of being almost completely intuitive.  Another nice benefit is
it does not require a SORT and in my tests is a KEEPER ;-).

COMPUTE ID=$CASENUM.
COMPUTE SCRAMBL=UNIFORM(1).
RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
IF MHPROB=0 ID0=ID.
IF MHPROB=1 ID1=ID.
AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
FREQ MATCH.

Comments:
RANK is able to construct 'counters' BY strata without the relevant cases
being contiguous.  NICE.
After the AGGREGATE the file will have the strata variables (and paired IDs
-ID1, ID2-) but not the MHPROB variable.  No problem since this information
is implied by presence/absence of ID0 and ID1.

Taking it further:
One could segregate the MATCH cases into a separate file, deleting from
working file and then rerun the code after doing a VARSTOCASES (ie restoring
ID from ID0 and ID1).  In this case I would probably.

COMPUTE a random variable and sort on it, then use a variant of the RANK as:
RANK ABCTOT  BY SEX AGE  mhPROB  (may need to specify TIES to deal with
duplicate values in ABCTOT?).
This would build RANKS of ABCTOT within the strata and a later AGGREGATE
would group them together as previously (fuzzy match within the ranked
values of ABCTOT).

NOTE:  In contrast to Gene's example I do not spread the data elements, I
just store the IDs.  To map the data to the IDs will simply require a
VARSTOCASES to make the file long -That's all you need to carry-
SORT CASES BY ID
MATCH FILES into the SORTED detail level file.
Hope this helps,
David










--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4273397.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group (A final Nail)

Albert-Jan Roskam
Hi Jon,

Very interesting. I didn't know that extension command. From the docstring:
        h is the current demander case hash
        case is the current supplier case
        return is 
        -  0 if no match
        -  1 if fuzzy match
        -  2 if exact match

Why only these discrete values and not values [0-1]? A better distinction could then be made between different candidate record pairs. Also, I wonder if it isn't a big penalty if a possible match is considered a non-match if one of the linkage vars is missing?
 
This has nothing to do with Fuzzy itself, but is following code fragment used in conjunction with gettext?:
    #enable localization
    global _
    try:
        _("---")
    except:
        def _(msg):
            return msg

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Thu, March 31, 2011 4:04:30 PM
Subject: Re: [SPSSX-L] Obtaining a matched control group (A final Nail)

Although it wasn't stated in the original post, it sounded to me like one of the match variables was continuous and therefore, exact matches would be unlikely.  In that case you would need a tolerance factor in order to get a match.  FUZZY, of course, handles all of this.

Contrary to what I recalled earlier, FUZZY should work with version 16 (but no earlier one).  The clue is the one-word name of the extension command, which is a limitation in V16.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        03/31/2011 07:56 AM
Subject:        Re: [SPSSX-L] Obtaining a matched control group (A final Nail)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Ivana,
You are very welcome!
I was think on this further after an interesting email from Gene regarding
sequences (similar to Hillel Vardi's post last night).  I came up with the
following tidbit which is much easier than my previous post and has the
added feature of being almost completely intuitive.  Another nice benefit is
it does not require a SORT and in my tests is a KEEPER ;-).

COMPUTE ID=$CASENUM.
COMPUTE SCRAMBL=UNIFORM(1).
RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
IF MHPROB=0 ID0=ID.
IF MHPROB=1 ID1=ID.
AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
FREQ MATCH.

Comments:
RANK is able to construct 'counters' BY strata without the relevant cases
being contiguous.  NICE.
After the AGGREGATE the file will have the strata variables (and paired IDs
-ID1, ID2-) but not the MHPROB variable.  No problem since this information
is implied by presence/absence of ID0 and ID1.

Taking it further:
One could segregate the MATCH cases into a separate file, deleting from
working file and then rerun the code after doing a VARSTOCASES (ie restoring
ID from ID0 and ID1).  In this case I would probably.

COMPUTE a random variable and sort on it, then use a variant of the RANK as:
RANK ABCTOT  BY SEX AGE  mhPROB  (may need to specify TIES to deal with
duplicate values in ABCTOT?).
This would build RANKS of ABCTOT within the strata and a later AGGREGATE
would group them together as previously (fuzzy match within the ranked
values of ABCTOT).

NOTE:  In contrast to Gene's example I do not spread the data elements, I
just store the IDs.  To map the data to the IDs will simply require a
VARSTOCASES to make the file long -That's all you need to carry-
SORT CASES BY ID
MATCH FILES into the SORTED detail level file.
Hope this helps,
David










--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4273397.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group (A final Nail)

Jon K Peck

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email]
Date:        03/31/2011 01:45 PM
Subject:        Re: [SPSSX-L] Obtaining a matched control group (A final Nail)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Jon,

Very interesting. I didn't know that extension command. From the docstring:
       h is the current demander case hash
       case is the current supplier case
       return is  
       -  0 if no match
       -  1 if fuzzy match
       -  2 if exact match

Why only these discrete values and not values [0-1]? A better distinction could then be made between different candidate record pairs. Also, I wonder if it isn't a big penalty if a possible match is considered a non-match if one of the linkage vars is missing?


>>>When I designed this, I felt that a missing value should not be considered as a match with anything - there is no information.  If someone wants different behavior, they can change the missing values temporarily.
>>>As for the metric, in order to provide a distance for the mismatch, there has to be some metric defined, so the user would have to provide that.  Of course, that only applies when not using an exact match.  In the case of categorical variables, this could be pretty messy.  And the user might well want to weight variables differently.  If a user wanted to provide a code fragment that calculated a distance, I could use that, but it would be hard to a user to get it right IMO.

The second problem here is that one might then want to minimize the total error in the matches, and that is a large integer programming problem that would require a substantially different approach to matching.  Other than the EXACTPRIORITY keyword, FUZZY picks at random from among all cases that satisfy the fuzz criteria.  If it picked the best match among the eligible ones, it would be giving priority to cases that are earlier in the file, and this could introduce a subtle bias in the matching behavior if the cases are not in random order  (there are some comments about this in the documentation).  Even with the current behavior, there is potential for this problem to occur in a milder way, which is why there is a SHUFFLE keyword to combat it, but that increases the time and memory requirements.
 
This has nothing to do with Fuzzy itself, but is following code fragment used in conjunction with gettext?:
   #enable localization
   global _
   try:
       _("---")
   except:
       def _(msg):
           return msg
>>>

I added some automatic setup for translations to the extensions.py module in, IIRC, version 18.  Since most of the extension commands also work with V17 and might not have the updated extensions.py module, the code above checks to see whether the _ function is defined and generates an identity function if not.  There are some subtleties with _ that are explained in the extension module code.  We write all the Python extension commands to be translatable now, even though many are not currently translated.  Documentation on how this works is in the extension command doc.

Thanks for the comments.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




From: Jon K Peck <[hidden email]>
To:
[hidden email]
Sent:
Thu, March 31, 2011 4:04:30 PM
Subject:
Re: [SPSSX-L] Obtaining a matched control group (A final Nail)


Although it wasn't stated in the original post, it sounded to me like one of the match variables was continuous and therefore, exact matches would be unlikely.  In that case you would need a tolerance factor in order to get a match.  FUZZY, of course, handles all of this.


Contrary to what I recalled earlier, FUZZY should work with version 16 (but no earlier one).  The clue is the one-word name of the extension command, which is a limitation in V16.


Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        
David Marso <[hidden email]>
To:        
[hidden email]
Date:        
03/31/2011 07:56 AM
Subject:        
Re: [SPSSX-L] Obtaining a matched control group (A final Nail)
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




Hi Ivana,
You are very welcome!
I was think on this further after an interesting email from Gene regarding
sequences (similar to Hillel Vardi's post last night).  I came up with the
following tidbit which is much easier than my previous post and has the
added feature of being almost completely intuitive.  Another nice benefit is
it does not require a SORT and in my tests is a KEEPER ;-).

COMPUTE ID=$CASENUM.
COMPUTE SCRAMBL=UNIFORM(1).
RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
IF MHPROB=0 ID0=ID.
IF MHPROB=1 ID1=ID.
AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
FREQ MATCH.

Comments:
RANK is able to construct 'counters' BY strata without the relevant cases
being contiguous.  NICE.
After the AGGREGATE the file will have the strata variables (and paired IDs
-ID1, ID2-) but not the MHPROB variable.  No problem since this information
is implied by presence/absence of ID0 and ID1.

Taking it further:
One could segregate the MATCH cases into a separate file, deleting from
working file and then rerun the code after doing a VARSTOCASES (ie restoring
ID from ID0 and ID1).  In this case I would probably.

COMPUTE a random variable and sort on it, then use a variant of the RANK as:
RANK ABCTOT  BY SEX AGE  mhPROB  (may need to specify TIES to deal with
duplicate values in ABCTOT?).
This would build RANKS of ABCTOT within the strata and a later AGGREGATE
would group them together as previously (fuzzy match within the ranked
values of ABCTOT).

NOTE:  In contrast to Gene's example I do not spread the data elements, I
just store the IDs.  To map the data to the IDs will simply require a
VARSTOCASES to make the file long -That's all you need to carry-
SORT CASES BY ID
MATCH FILES into the SORTED detail level file.
Hope this helps,
David










--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Obtaining-a-matched-control-group-tp4271299p4273397.html
Sent from the SPSSX Discussion mailing list archive at
Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group (A final Nail)

hillel vardi
In reply to this post by David Marso
  Shalom

  After thinking all other answers I am quit sure that using Aggregate ,
Lag or Rank will not work .
Te reason for that is that the assumption that there will be controls in
all the groups is not met in all situations.
  Here is an example using David Marso program ( i only reduce the
number of cases to 8 and controls to 20 ) .

input program.
loop sex= 1 to 2.
loop #=1 to 4.
compute age=trunc(uniform(10)).
compute abctot = trunc(uniform(10))/10.
compute mhprob=1.
leave sex.
end case.
end loop.
end loop.
loop sex= 1 to 2.
loop #=1 to 10.
compute age=trunc(uniform(10)).
compute abctot = trunc(uniform(10))/10.
compute mhprob=0.
leave sex.
end case.
end loop.
end loop.
end file.
end input program.
string datamark(a8).
COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)).
exe.
COMPUTE ID=$CASENUM.
COMPUTE SCRAMBL=UNIFORM(1).
RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
IF MHPROB=0 ID0=ID.
IF MHPROB=1 ID1=ID.
AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
FREQ MATCH.

Hillel Vardi
BGU

On 31/03/2011 15:52, David Marso wrote:

> Hi Ivana,
> You are very welcome!
> I was think on this further after an interesting email from Gene regarding
> sequences (similar to Hillel Vardi's post last night).  I came up with the
> following tidbit which is much easier than my previous post and has the
> added feature of being almost completely intuitive.  Another nice benefit is
> it does not require a SORT and in my tests is a KEEPER ;-).
>
> COMPUTE ID=$CASENUM.
> COMPUTE SCRAMBL=UNIFORM(1).
> RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
> IF MHPROB=0 ID0=ID.
> IF MHPROB=1 ID1=ID.
> AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
> COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
> FREQ MATCH.
>
> Comments:
> RANK is able to construct 'counters' BY strata without the relevant cases
> being contiguous.  NICE.
> After the AGGREGATE the file will have the strata variables (and paired IDs
> -ID1, ID2-) but not the MHPROB variable.  No problem since this information
> is implied by presence/absence of ID0 and ID1.
>
> Taking it further:
> One could segregate the MATCH cases into a separate file, deleting from
> working file and then rerun the code after doing a VARSTOCASES (ie restoring
> ID from ID0 and ID1).  In this case I would probably.
>
> COMPUTE a random variable and sort on it, then use a variant of the RANK as:
> RANK ABCTOT  BY SEX AGE  mhPROB  (may need to specify TIES to deal with
> duplicate values in ABCTOT?).
> This would build RANKS of ABCTOT within the strata and a later AGGREGATE
> would group them together as previously (fuzzy match within the ranked
> values of ABCTOT).
>
> NOTE:  In contrast to Gene's example I do not spread the data elements, I
> just store the IDs.  To map the data to the IDs will simply require a
> VARSTOCASES to make the file long -That's all you need to carry-
> SORT CASES BY ID
> MATCH FILES into the SORTED detail level file.
> Hope this helps,
> David
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Obtaining a matched control group (A final Nail)

David Marso
Administrator
I really wouldn't expect ANYTHING to work well with those sample sizes
and distributions ;-)
My code should be pretty much usable for reasonably large samples.
How does Jon's Fuzzy do with this data?

On Thu, Mar 31, 2011 at 7:08 PM, hillel vardi <[hidden email]> wrote:

>  Shalom
>
>  After thinking all other answers I am quit sure that using Aggregate , Lag
> or Rank will not work .
> Te reason for that is that the assumption that there will be controls in all
> the groups is not met in all situations.
>  Here is an example using David Marso program ( i only reduce the number of
> cases to 8 and controls to 20 ) .
>
> input program.
> loop sex= 1 to 2.
> loop #=1 to 4.
> compute age=trunc(uniform(10)).
> compute abctot = trunc(uniform(10))/10.
> compute mhprob=1.
> leave sex.
> end case.
> end loop.
> end loop.
> loop sex= 1 to 2.
> loop #=1 to 10.
> compute age=trunc(uniform(10)).
> compute abctot = trunc(uniform(10))/10.
> compute mhprob=0.
> leave sex.
> end case.
> end loop.
> end loop.
> end file.
> end input program.
> string datamark(a8).
> COMPUTE datamark=CONCAT("DATA",STRING($CASENUM,N4)).
> exe.
> COMPUTE ID=$CASENUM.
> COMPUTE SCRAMBL=UNIFORM(1).
> RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
> IF MHPROB=0 ID0=ID.
> IF MHPROB=1 ID1=ID.
> AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
> COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
> FREQ MATCH.
>
> Hillel Vardi
> BGU
>
> On 31/03/2011 15:52, David Marso wrote:
>>
>> Hi Ivana,
>> You are very welcome!
>> I was think on this further after an interesting email from Gene regarding
>> sequences (similar to Hillel Vardi's post last night).  I came up with the
>> following tidbit which is much easier than my previous post and has the
>> added feature of being almost completely intuitive.  Another nice benefit
>> is
>> it does not require a SORT and in my tests is a KEEPER ;-).
>>
>> COMPUTE ID=$CASENUM.
>> COMPUTE SCRAMBL=UNIFORM(1).
>> RANK SCRAMBL  BY SEX AGE ABCTOT mhPROB.
>> IF MHPROB=0 ID0=ID.
>> IF MHPROB=1 ID1=ID.
>> AGGREGATE OUTFILE * / BREAK sex age ABCTOT rscrambl /id0 id1=max(id0 id1).
>> COMPUTE MATCH=NOT(MISSING(ID1)) AND NOT(MISSING(ID0)).
>> FREQ MATCH.
>>
>> Comments:
>> RANK is able to construct 'counters' BY strata without the relevant cases
>> being contiguous.  NICE.
>> After the AGGREGATE the file will have the strata variables (and paired
>> IDs
>> -ID1, ID2-) but not the MHPROB variable.  No problem since this
>> information
>> is implied by presence/absence of ID0 and ID1.
>>
>> Taking it further:
>> One could segregate the MATCH cases into a separate file, deleting from
>> working file and then rerun the code after doing a VARSTOCASES (ie
>> restoring
>> ID from ID0 and ID1).  In this case I would probably.
>>
>> COMPUTE a random variable and sort on it, then use a variant of the RANK
>> as:
>> RANK ABCTOT  BY SEX AGE  mhPROB  (may need to specify TIES to deal with
>> duplicate values in ABCTOT?).
>> This would build RANKS of ABCTOT within the strata and a later AGGREGATE
>> would group them together as previously (fuzzy match within the ranked
>> values of ABCTOT).
>>
>> NOTE:  In contrast to Gene's example I do not spread the data elements, I
>> just store the IDs.  To map the data to the IDs will simply require a
>> VARSTOCASES to make the file long -That's all you need to carry-
>> SORT CASES BY ID
>> MATCH FILES into the SORTED detail level file.
>> Hope this helps,
>> David
>>
>>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
12