SPSSX Discussion

Random sample matched on variables

Classic

List

Threaded

7 messages Options

Koen N

Random sample matched on variables

Dear more experienced than me SPSS users,

I'm currently working on a research project in which I've hit a wall. I have three groups which I want to compare on a number of analysis. One of these groups is much too large compared to the other groups. This much too large group is a control group. The other two groups are comprised of a rare demographic. As of such, I want to drop my control group of 65 down to 25-30 while matching them on certain demographic variables (namely age, IQ and education). The other two groups don't differ significantly from one another on the mean of these variables. My control group does differ sognificantly. I want my control group sample to be within the same range of the means of my other two groups.

Obviously (otherwise I wouldn't be posting), I am completely lost as to how to do this. I assume I'll be using the syntax, but I am unsure what commands I should be using. I've searched around and found some similar cases as mine, except they wanted to take a random sample based on propensity scores (which I don't believe works for me) or they wanted to take a sample out of two groups (while I want a sample out of one).

If anyone could assist me, I would be very grateful. Thanks for your time!

Bruce Weaver

Re: Random sample matched on variables

Administrator

From what you've said, it sounds as if N=65 for control versus 25-30 for the other two groups. That is not what I would describe as a large discrepancy in sample sizes.

What type of study design is it? (E.g., see http://www.med.uottawa.ca/sim/data/Study_Designs_e.htm.)

What kinds of outcome (dependent) variables do you have, and what types of models are you using?

Koen N wrote

Dear more experienced than me SPSS users,

I'm currently working on a research project in which I've hit a wall. I have three groups which I want to compare on a number of analysis. One of these groups is much too large compared to the other groups. This much too large group is a control group. The other two groups are comprised of a rare demographic. As of such, I want to drop my control group of 65 down to 25-30 while matching them on certain demographic variables (namely age, IQ and education). The other two groups don't differ significantly from one another on the mean of these variables. My control group does differ sognificantly. I want my control group sample to be within the same range of the means of my other two groups.

Obviously (otherwise I wouldn't be posting), I am completely lost as to how to do this. I assume I'll be using the syntax, but I am unsure what commands I should be using. I've searched around and found some similar cases as mine, except they wanted to take a random sample based on propensity scores (which I don't believe works for me) or they wanted to take a sample out of two groups (while I want a sample out of one).

If anyone could assist me, I would be very grateful. Thanks for your time!

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Koen N

Re: Random sample matched on variables

It's an observational study with three research questions pertaining to the ability of implicit measures in diagnosing two different groups of pedophiles and the ability of these implicit measures to predict risk of recidivism. You're right that it's not a huge difference in N, but the demographic difference warrants action.

The sample I want to pull from the control group is to be used in 3 analysis: a 1-factor ANOVA, a repeated measures ANOVA and a ROC analysis.

I hope this is enough information.

Paul Oosterveld

Re: Random sample matched on variables

In reply to this post by Koen N

Koen,

In this situation I would rather weight on the propensity score than match
on these (the propensitie can be obtained by logistic regression). In that
approach all data in the control group could be included, while having the
demographic frequencies characteristics of the (combined) intervention
groups. For turning this problem into a script, frequency distributions of
the demographics in the three groups would be needed. If this will work
depends on the discrepancies between the groups.

Regards,

Paul Oosterveld.

On Sun, 11 Nov 2012 05:55:17 -0800, Bruce Weaver <[hidden email]>
wrote:

>From what you've said, it /sounds/ as if N=65 for control versus 25-30 for
>the other two groups. That is not what I would describe as a large
>discrepancy in sample sizes.
>
>What type of study design is it? (E.g., see
>http://www.med.uottawa.ca/sim/data/Study_Designs_e.htm.)
>
>What kinds of outcome (dependent) variables do you have, and what types of
>models are you using?
>
>
>
>Koen N wrote
>> Dear more experienced than me SPSS users,
>>
>> I'm currently working on a research project in which I've hit a wall. I
>> have three groups which I want to compare on a number of analysis. One of
>> these groups is much too large compared to the other groups. This much

too
>> large group is a control group. The other two groups are comprised of a
>> rare demographic. As of such, I want to drop my control group of 65 down
>> to 25-30 while matching them on certain demographic variables (namely
age,

>> IQ and education). The other two groups don't differ significantly from
>> one another on the mean of these variables. My control group does differ
>> sognificantly. I want my control group sample to be within the same range
>> of the means of my other two groups.
>>
>> Obviously (otherwise I wouldn't be posting), I am completely lost as to
>> how to do this. I assume I'll be using the syntax, but I am unsure what
>> commands I should be using. I've searched around and found some similar
>> cases as mine, except they wanted to take a random sample based on
>> propensity scores (which I don't believe works for me) or they wanted to
>> take a sample out of two groups (while I want a sample out of one).
>>
>> If anyone could assist me, I would be very grateful. Thanks for your

time!

>
>
>
>
>
>-----
>--
>Bruce Weaver
>[hidden email]
>http://sites.google.com/a/lakeheadu.ca/bweaver/
>
>"When all else fails, RTFM."
>
>NOTE: My Hotmail account is not monitored regularly.
>To send me an e-mail, please use the address shown above.
>
>--
>View this message in context: http://spssx-

discussion.1045642.n5.nabble.com/Random-sample-matched-on-variables-
tp5716152p5716153.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Martha Hewett

Re: Random sample matched on variables

In reply to this post by Koen N

Using propensity scores rather than matching on specific demographics is likely to produce less biased results. For a start check this:

http://en.wikipedia.org/wiki/Propensity_score_matching

From: Koen N <[hidden email]>
To: [hidden email]
Date: 11/12/2012 09:58 AM
Subject: Random sample matched on variables
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Dear more experienced than me SPSS users, I'm currently working on a research project in which I've hit a wall. I have three groups which I want to compare on a number of analysis. One of these groups is much too large compared to the other groups. This much too large group is a control group. The other two groups are comprised of a rare demographic. As of such, I want to drop my control group of 65 down to 25-30 while matching them on certain demographic variables (namely age, IQ and education). The other two groups don't differ significantly from one another on the mean of these variables. My control group does differ sognificantly. I want my control group sample to be within the same range of the means of my other two groups. Obviously (otherwise I wouldn't be posting), I am completely lost as to how to do this. I assume I'll be using the syntax, but I am unsure what commands I should be using. I've searched around and found some similar cases as mine, except they wanted to take a random sample based on propensity scores (which I don't believe works for me) or they wanted to take a sample out of two groups (while I want a sample out of one). If anyone could assist me, I would be very grateful. Thanks for your time! -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Random-sample-matched-on-variables-tp5716152.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Koen N

Re: Random sample matched on variables

Both Paul and Martha,

thank you for your considerations. I hadn't delved too far into propensity scores yet but from what I could tell earlier it was only to be used when there's a 'treatment' involved. But if I understand correctly, the 'treatment' can be anything that seperates two groups (which in this case would be paedophilia). As such it could be defined as "the probability of being a pedophile based on measured covariates". Am I correct in this understanding?

If I understand correctly I am to run a logistic regression in which I use my group variable (which I first transform into having two groups instead of three, which would be theoretically fine) as the dependent variable and my selected covariates as predictors. I save my predicted values. After this the actual matching begins and most sense would be to use the nearest neighbour matching. It seems as if nearest neighbour matching is going to be quite some work (manually finding the propensity score that is nearest to my pedophilic group amongst the control group for every pedophilic participant). Although I think I can cut down on a lot of time on that by using some simple commands.

Is my understanding correct or am I mistaking on any of the steps I should take?

Thanks again everyone!

Ryan

Re: Random sample matched on variables

I haven't been following this thread but this particular post jogged my memory; I came across an interesting SUGI paper a while back that you might find useful:

http://www.math.u-szeged.hu/stoch/szeminarium_absztrakt/Propensity_scores.pdf

I have seen other, more sophisticated approaches as well.

Again, not sure if this is related to what you're discussing. If not, feel free to disregard.

Ryan

On Nov 13, 2012, at 4:20 AM, Koen N <[hidden email]> wrote:

Both Paul and Martha,

thank you for your considerations. I hadn't delved too far into propensity
scores yet but from what I could tell earlier it was only to be used when
there's a 'treatment' involved. But if I understand correctly, the
'treatment' can be anything that seperates two groups (which in this case
would be paedophilia). As such it could be defined as "the probability of
being a pedophile based on measured covariates". Am I correct in this
understanding?

If I understand correctly I am to run a logistic regression in which I use
my group variable (which I first transform into having two groups instead of
three, which would be theoretically fine) as the dependent variable and my
selected covariates as predictors. I save my predicted values. After this
the actual matching begins and most sense would be to use the nearest
neighbour matching. It seems as if nearest neighbour matching is going to be
quite some work (manually finding the propensity score that is nearest to my
pedophilic group amongst the control group for every pedophilic
participant). Although I think I can cut down on a lot of time on that by
using some simple commands.

Is my understanding correct or am I mistaking on any of the steps I should
take?

Thanks again everyone!

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Random-sample-matched-on-variables-tp5716152p5716163.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD