Matching Cases

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Matching Cases

Justin Blehar
Dear List,

I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. 

The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. 

I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?

Thanks a lot!

Justin
  


Reply | Threaded
Open this post in threaded view
|

Re: Matching Cases

Dale

Justin,

 

I am just curious.  How have you determined there are no duplicate anonymous subjects in the subject pool?


Dale Pietrzak, Ed.D., LPCMH, CCMHC

Director, Office of Academic Evaluation and Assessment

University of South Dakota

Slagle Hall Room 102

414 East Clark Street

605-677-6497

 

Never play a thing the same way twice.   Louis Armstrong

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar
Sent: Wednesday, July 24, 2013 3:35 PM
To: [hidden email]
Subject: Matching Cases

 

Dear List,

 

I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. 

 

The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. 

 

I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?

 

Thanks a lot!

 

Justin

  

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Matching Cases

Justin Blehar
Dale,

There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people that do take the test repeatedly won't skew the results as much. 

Justin

From: Pietrzak, Dale [[hidden email]]
Sent: Wednesday, July 24, 2013 5:39 PM
To: Justin Blehar; [hidden email]
Subject: RE: Matching Cases

Justin,

 

I am just curious.  How have you determined there are no duplicate anonymous subjects in the subject pool?


Dale Pietrzak, Ed.D., LPCMH, CCMHC

Director, Office of Academic Evaluation and Assessment

University of South Dakota

Slagle Hall Room 102

414 East Clark Street

605-677-6497

 

Never play a thing the same way twice.   Louis Armstrong

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar
Sent: Wednesday, July 24, 2013 3:35 PM
To: [hidden email]
Subject: Matching Cases

 

Dear List,

 

I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. 

 

The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. 

 

I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?

 

Thanks a lot!

 

Justin

  

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Matching Cases

Rich Ulrich
This is one more instance where matching seems to lack much utility.

Matching is especially useful when you have *high* similarity within
the match:  left side of the body with right side; or siblings....  For
simple analyses, you are better off using ANCOVA with the control
variables as covariates or factors.  In your problem you might graph
out the results by levels of the separate variables.

Matching for populations certainly wastes power or precision by
throwing away the cases omitted.  Is there any gain to offset that loss?

If there is some concern that folks might be improving scores by
re-testing, then, with plenty of cases to subdivide, it might be more
useful to divide responses by week or month.  Actually, that brings to mind
the reality that you probably have little or no control over the population
that is attracted to the internet version -- Would that be constant over
time, or might that change?

--
Rich Ulrich


Date: Wed, 24 Jul 2013 22:43:50 +0000
From: [hidden email]
Subject: Re: Matching Cases
To: [hidden email]

Dale,

There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people that do take the test repeatedly won't skew the results as much. 

Justin

From: Pietrzak, Dale [[hidden email]]
Sent: Wednesday, July 24, 2013 5:39 PM
To: Justin Blehar; [hidden email]
Subject: RE: Matching Cases

Justin,

 

I am just curious.  How have you determined there are no duplicate anonymous subjects in the subject pool?


Dale Pietrzak, Ed.D., LPCMH, CCMHC

Director, Office of Academic Evaluation and Assessment

University of South Dakota

Slagle Hall Room 102

414 East Clark Street

605-677-6497

 

Never play a thing the same way twice.   Louis Armstrong

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar
Sent: Wednesday, July 24, 2013 3:35 PM
To: [hidden email]
Subject: Matching Cases

 

Dear List,

 

I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. 

 

The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. 

 

I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?

 

Thanks a lot!

 

Justin

  

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Matching Cases

Dale
In reply to this post by Justin Blehar

I do not know your intent with the findings.  However, just to let you know, I would have serious reservations if it were sent to any of the journals I have worked on over the years.  Something o think about.

 

Dale

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar
Sent: Wednesday, July 24, 2013 5:44 PM
To: [hidden email]
Subject: Re: Matching Cases

 

Dale,

 

There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people that do take the test repeatedly won't skew the results as much. 

 

Justin


From: Pietrzak, Dale [[hidden email]]
Sent: Wednesday, July 24, 2013 5:39 PM
To: Justin Blehar; [hidden email]
Subject: RE: Matching Cases

Justin,

 

I am just curious.  How have you determined there are no duplicate anonymous subjects in the subject pool?


Dale Pietrzak, Ed.D., LPCMH, CCMHC

Director, Office of Academic Evaluation and Assessment

University of South Dakota

Slagle Hall Room 102

414 East Clark Street

605-677-6497

 

Never play a thing the same way twice.   Louis Armstrong

 

 

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Justin Blehar
Sent: Wednesday, July 24, 2013 3:35 PM
To: [hidden email]
Subject: Matching Cases

 

Dear List,

 

I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. 

 

The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. 

 

I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?

 

Thanks a lot!

 

Justin

  

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Matching Cases

Maguin, Eugene
In reply to this post by Justin Blehar

I’m going to suggest a method to do what you want but I don’t want it to stand in opposition to Rich’s comments. They take precedence.  Perhaps my thinking will need some correction but it seems to me that the matching requirement is based on the assumption that persons grouped by the matching variables are more alike to each other on the DVs of interest than they are to persons from other match group variables. Thus you should expect to find some level  of interclass correlation in the internet group. Failure to find that, I think, argues against matching.

 

I’d start this way. Given each record in the internet file a unique id, if it doesn’t have it already then sort by the matching variables. Give each set of records with the same matching variable values a new id, call it groupid. Do a casestovars command to convert from long to wide format. Next, do a match files to match the internet file to the community file by the matching variables (make sure the community file is sorted.) Lastly do a varstocases to convert the file from wide to long. I acknowledge that adding the id variables in the internet file is not required but I’d add them just because I would.

 

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar
Sent: Wednesday, July 24, 2013 4:35 PM
To: [hidden email]
Subject: Matching Cases

 

Dear List,

 

I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. 

 

The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. 

 

I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?

 

Thanks a lot!

 

Justin

  

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Matching Cases

Bruce Weaver
Administrator
In reply to this post by Rich Ulrich
Chapter 12 in this book is a good reference supporting Rich's comments on matching.  The authors conclude that very often, multivariable (they say multivariate) regression analysis is preferable to matching.

HTH.


Rich Ulrich wrote
This is one more instance where matching seems to lack much utility.

Matching is especially useful when you have *high* similarity within
the match:  left side of the body with right side; or siblings....  For
simple analyses, you are better off using ANCOVA with the control
variables as covariates or factors.  In your problem you might graph
out the results by levels of the separate variables.

Matching for populations certainly wastes power or precision by
throwing away the cases omitted.  Is there any gain to offset that loss?

If there is some concern that folks might be improving scores by
re-testing, then, with plenty of cases to subdivide, it might be more
useful to divide responses by week or month.  Actually, that brings to mind
the reality that you probably have little or no control over the population
that is attracted to the internet version -- Would that be constant over
time, or might that change?

--
Rich Ulrich

Date: Wed, 24 Jul 2013 22:43:50 +0000
From: [hidden email]
Subject: Re: Matching Cases
To: [hidden email]







Dale,



There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is
 within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online
 tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people
 that do take the test repeatedly won't skew the results as much.



Justin



From: Pietrzak, Dale [[hidden email]]

Sent: Wednesday, July 24, 2013 5:39 PM

To: Justin Blehar; [hidden email]

Subject: RE: Matching Cases







Justin,
 
I am just curious.  How have you determined there are no duplicate anonymous subjects in the subject pool?


Dale Pietrzak, Ed.D., LPCMH, CCMHC
Director, Office of Academic Evaluation and Assessment
University of South Dakota
Slagle Hall Room 102
414 East Clark Street
605-677-6497
 
Never play a thing the same way twice.   Louis Armstrong





 
 
 


From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Justin Blehar

Sent: Wednesday, July 24, 2013 3:35 PM

To: [hidden email]

Subject: Matching Cases


 

Dear List,


 


I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet
 sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample
 is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2.


 


The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've
 been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether
 they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide
 further validation of the "new" measure.


 


I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match
 file syntax that I can modify as I learn?


 


Thanks a lot!


 


Justin
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).