Dear List,
I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group
1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from
group 2.
The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test.
Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample
to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure.
I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?
Thanks a lot!
Justin
|
Justin, I am just curious. How have you determined there are no duplicate anonymous subjects in the subject pool?
Director, Office of Academic Evaluation and Assessment University of South Dakota Slagle Hall Room 102 414 East Clark Street 605-677-6497 Never play a thing the same way twice. Louis Armstrong From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar Dear List, I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn? Thanks a lot! Justin |
Dale,
There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is
within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online
tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people
that do take the test repeatedly won't skew the results as much.
Justin
From: Pietrzak, Dale [[hidden email]]
Sent: Wednesday, July 24, 2013 5:39 PM To: Justin Blehar; [hidden email] Subject: RE: Matching Cases Justin,
I am just curious. How have you determined there are no duplicate anonymous subjects in the subject pool?
Director, Office of Academic Evaluation and Assessment University of South Dakota Slagle Hall Room 102 414 East Clark Street 605-677-6497
Never play a thing the same way twice. Louis Armstrong
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Justin Blehar
Dear List,
I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2.
The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure.
I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?
Thanks a lot!
Justin
|
This is one more instance where matching seems to lack much utility.
Matching is especially useful when you have *high* similarity within the match: left side of the body with right side; or siblings.... For simple analyses, you are better off using ANCOVA with the control variables as covariates or factors. In your problem you might graph out the results by levels of the separate variables. Matching for populations certainly wastes power or precision by throwing away the cases omitted. Is there any gain to offset that loss? If there is some concern that folks might be improving scores by re-testing, then, with plenty of cases to subdivide, it might be more useful to divide responses by week or month. Actually, that brings to mind the reality that you probably have little or no control over the population that is attracted to the internet version -- Would that be constant over time, or might that change? -- Rich Ulrich Date: Wed, 24 Jul 2013 22:43:50 +0000 From: [hidden email] Subject: Re: Matching Cases To: [hidden email] Dale,
There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is
within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online
tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people
that do take the test repeatedly won't skew the results as much.
Justin
From: Pietrzak, Dale [[hidden email]]
Sent: Wednesday, July 24, 2013 5:39 PM To: Justin Blehar; [hidden email] Subject: RE: Matching Cases Justin,
I am just curious. How have you determined there are no duplicate anonymous subjects in the subject pool?
Director, Office of Academic Evaluation and Assessment University of South Dakota Slagle Hall Room 102 414 East Clark Street 605-677-6497
Never play a thing the same way twice. Louis Armstrong
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Justin Blehar
Dear List,
I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2.
The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure.
I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn?
Thanks a lot!
Justin
|
In reply to this post by Justin Blehar
I do not know your intent with the findings. However, just to let you know, I would have serious reservations if it were sent to any of the journals I have worked on over the years. Something o think about. Dale From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar Dale, There is no way to no for sure and that is a concern. The measure has a number of variables and includes a short memory test as well. There are cases that it appears the same individual took the test again (e.g. all variables are identical, the time is within a couple minutes of the first test and the only difference is a higher score on the mem test). When this occurs I only keep the first case and delete the second case. This test has been up for a few years and it is rare to have a bunch of the online tests taken around the same time. However, there is no way to be sure that the same people aren't retaking the test a bunch of times or on a weekly basis. Just one of the downsides of an anonymous online measure. But hopefully with a large enough N people that do take the test repeatedly won't skew the results as much. Justin From: Pietrzak, Dale [[hidden email]] Justin, I am just curious. How have you determined there are no duplicate anonymous subjects in the subject pool?
Director, Office of Academic Evaluation and Assessment University of South Dakota Slagle Hall Room 102 414 East Clark Street 605-677-6497 Never play a thing the same way twice. Louis Armstrong From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Justin Blehar Dear List, I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn? Thanks a lot! Justin |
In reply to this post by Justin Blehar
I’m going to suggest a method to do what you want but I don’t want it to stand in opposition to Rich’s comments. They take precedence. Perhaps my thinking will need some correction but it seems to me that the matching requirement is based on the assumption that persons grouped by the matching variables are more alike to each other on the DVs of interest than they are to persons from other match group variables. Thus you should expect to find some level of interclass correlation in the internet group. Failure to find that, I think, argues against matching. I’d start this way. Given each record in the internet file a unique id, if it doesn’t have it already then sort by the matching variables. Give each set of records with the same matching variable values a new id, call it groupid. Do a casestovars command to convert from long to wide format. Next, do a match files to match the internet file to the community file by the matching variables (make sure the community file is sorted.) Lastly do a varstocases to convert the file from wide to long. I acknowledge that adding the id variables in the internet file is not required but I’d add them just because I would. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Justin Blehar Dear List, I'm attempting to conduct an analysis to validate a new test. There are two groups that I have to look at. A community sample (group 1) and an anonymous internet sample (group 2). Prior to conducting the analysis I need to match individual cases from group 1 to group 2 based on age, sex, and education. The N's from each group are significantly different (e.g. my community sample is around 150 and my internet sample is in the thousands). For each individual case in group 1 I'd like to match similar cases from group 2. The thought process behind this is to use a community sample (group 1) that was administered the "new" test and was also administered a standardized test. We've been able to show that the "new" test looks pretty good in comparison to the standardized test. Now we have this large anonymous internet sample that took the "new" test but we don't know who these people are, what conditions they took the test in, whether they cheated etc.... So perhaps by matching 5 to 10 cases of the large anonymous internet sample to each individual case from the known community sample and then seeing how they compare on different variables we can see if this anonymous sample can help provide further validation of the "new" measure. I'm familiar with matching however this is well outside of what I'm able to do right now. So I was hoping for a point in the right direction or some basic match file syntax that I can modify as I learn? Thanks a lot! Justin |
Administrator
|
In reply to this post by Rich Ulrich
Chapter 12 in this book is a good reference supporting Rich's comments on matching. The authors conclude that very often, multivariable (they say multivariate) regression analysis is preferable to matching.
HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |