SPSSX Discussion

Re: Question about random selection with matching

Classic

List

Threaded

7 messages Options

Swee May Cripe

Re: Question about random selection with matching

Hello,

I have a dataset with cases in variable A, and I would like to randomly
select 3 controls (in variable B) matched to the year (variable C) that
each case was reported in. SPSS allows for random selection, but I am not
sure how to select controls at random matched to the year for each case.

Appreciate any insight and assistance on how to do this in SPSS.

Thanks much,
SM Cripe

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Question about random selection with matching

SM,

Some things in your explanation don't make sense to me. So, is variable A is
the id number? What is variable B about? What values does it take on?
Variable C is the year, I got that. Also, define your use of the word
'case'? Are you using the word to refer to a record in the file or are you
using the word to refer to a record that is positive for a condition
relative to a record that is negative for the condition, as in a case
control study? Please post back to the list.

>>I have a dataset with cases in variable A, and I would like to randomly
select 3 controls (in variable B) matched to the year (variable C) that
each case was reported in.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Swee May Cripe

Re: Question about random selection with matching

Thank you, Gene, for your follow-up post. To clarify, yes, this is
similar to a case-control study.

A correction--closer look at the dataset revealed that Variable A denotes
individuals not from the US (variable A=1 [case]) and individuals from the
US [American Caucasians (variable A=2 {first control group}), and African
Americans (variable A=3 {second control group})].

Variable B refers to country of origin for all individuals in the dataset
(numeric value for each country represented). The analysis will be
conducted by country of origin.

For example, to compare characteristics of individuals from Peru (variable
A=1 and variable B=Peru) with those of American Caucasians (control group
1) and with those of African Americans (control group 2), I would like to
randomly select three Caucasians and three African Americans for each
Peru-born case matched by year of birth (variable C).

Any advice on how to execute this random selection with matching in SPSS
would be greatly appreciated. Please let me know if there are further
questions.

Thanks much,
SM

On Mon, 13 Oct 2008, Gene Maguin wrote:

> SM,
>
> Some things in your explanation don't make sense to me. So, is variable A is
> the id number? What is variable B about? What values does it take on?
> Variable C is the year, I got that. Also, define your use of the word
> 'case'? Are you using the word to refer to a record in the file or are you
> using the word to refer to a record that is positive for a condition
> relative to a record that is negative for the condition, as in a case
> control study? Please post back to the list.
>
>>> I have a dataset with cases in variable A, and I would like to randomly
> select 3 controls (in variable B) matched to the year (variable C) that
> each case was reported in.
>
>
> Gene Maguin

Peck, Jon

Re: Question about random selection with matching

One tool available with SPSS 16 or later is the extension command
CASECTRL, which can be downloaded from SPSS Developer Central (www.spss.com/devcentral).

You specify "supplier" and "demander" datasets and the list of keys that define the (exact) match. You can specify how many supplier matches you want for each demander, and there are various options for what sort of output to produce. When there are multiple matches for a demander case, CASECTRL picks randomly from supplier cases.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swee May Cripe
Sent: Monday, October 13, 2008 3:10 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Question about random selection with matching

Thank you, Gene, for your follow-up post. To clarify, yes, this is
similar to a case-control study.

A correction--closer look at the dataset revealed that Variable A denotes
individuals not from the US (variable A=1 [case]) and individuals from the
US [American Caucasians (variable A=2 {first control group}), and African
Americans (variable A=3 {second control group})].

Variable B refers to country of origin for all individuals in the dataset
(numeric value for each country represented). The analysis will be
conducted by country of origin.

For example, to compare characteristics of individuals from Peru (variable
A=1 and variable B=Peru) with those of American Caucasians (control group
1) and with those of African Americans (control group 2), I would like to
randomly select three Caucasians and three African Americans for each
Peru-born case matched by year of birth (variable C).

Any advice on how to execute this random selection with matching in SPSS
would be greatly appreciated. Please let me know if there are further
questions.

Thanks much,
SM

On Mon, 13 Oct 2008, Gene Maguin wrote:

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Question about random selection with matching

In reply to this post by Swee May Cripe

Swee May,

This seems to be pretty hard to do.

1) Is it correct to assume that for US whites (caucasians) and US blacks,
country (variable B) equals US?

2) Is it also correct to assume that there are no other matching variables?
Not age, gender, rural-urban, etc?

Specific answer to each question, please.

I'm sure something like this has been done, and maybe kind of recently, but
nothing specific comes to mind.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swee May Cripe
Sent: Monday, October 13, 2008 5:10 PM
To: [hidden email]
Subject: Re: Question about random selection with matching

Thank you, Gene, for your follow-up post. To clarify, yes, this is
similar to a case-control study.

A correction--closer look at the dataset revealed that Variable A denotes
individuals not from the US (variable A=1 [case]) and individuals from the
US [American Caucasians (variable A=2 {first control group}), and African
Americans (variable A=3 {second control group})].

Variable B refers to country of origin for all individuals in the dataset
(numeric value for each country represented). The analysis will be
conducted by country of origin.

For example, to compare characteristics of individuals from Peru (variable
A=1 and variable B=Peru) with those of American Caucasians (control group
1) and with those of African Americans (control group 2), I would like to
randomly select three Caucasians and three African Americans for each
Peru-born case matched by year of birth (variable C).

Any advice on how to execute this random selection with matching in SPSS
would be greatly appreciated. Please let me know if there are further
questions.

Thanks much,
SM

On Mon, 13 Oct 2008, Gene Maguin wrote:

> SM,
>
> Some things in your explanation don't make sense to me. So, is variable A
is

> the id number? What is variable B about? What values does it take on?
> Variable C is the year, I got that. Also, define your use of the word
> 'case'? Are you using the word to refer to a record in the file or are you
> using the word to refer to a record that is positive for a condition
> relative to a record that is negative for the condition, as in a case
> control study? Please post back to the list.
>
>>> I have a dataset with cases in variable A, and I would like to randomly
> select 3 controls (in variable B) matched to the year (variable C) that
> each case was reported in.
>
>
> Gene Maguin

Swee May Cripe

Re: Question about random selection with matching

Gene,

I will respond to your questions below here.

I have also included Jon Peck's response below regarding a CASECTRL
extension that is available for version 16.0.1 and above. Unfortunately,
for now, I do not have access to version 16.0.1.

On Tue, 14 Oct 2008, Gene Maguin wrote:

> Swee May,
>
> This seems to be pretty hard to do.
>
> 1) Is it correct to assume that for US whites (caucasians) and US blacks,
> country (variable B) equals US?

Yes, US caucasians and US African-Americans have the same country code in
variable B.

>
> 2) Is it also correct to assume that there are no other matching variables?
> Not age, gender, rural-urban, etc?

Correct, there are no other matching variables.

> Specific answer to each question, please.
>
> I'm sure something like this has been done, and maybe kind of recently, but
> nothing specific comes to mind.
>
>
> Gene Maguin
>

Thanks much for your insight.

--Swee May

_____________________________________________

Date: Mon, 13 Oct 2008 16:28:16 -0500
From: "Peck, Jon" <[hidden email]>
To: [hidden email]
Subject: RE: Re: [SPSSX-L] Question about random selection with
matching

One tool available with SPSS 16 or later is the extension command
CASECTRL, which can be downloaded from SPSS Developer Central
(www.spss.com/devcentral).

You specify "supplier" and "demander" datasets and the list of keys that
define the (exact) match. You can specify how many supplier matches you
want for each demander, and there are various options for what sort of
output to produce. When there are multiple matches for a demander case,
CASECTRL picks randomly from supplier cases.

HTH,
Jon Peck

________________________________________

>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Swee May Cripe
> Sent: Monday, October 13, 2008 5:10 PM
> To: [hidden email]
> Subject: Re: Question about random selection with matching
>
>
> Thank you, Gene, for your follow-up post. To clarify, yes, this is
> similar to a case-control study.
>
> A correction--closer look at the dataset revealed that Variable A denotes
> individuals not from the US (variable A=1 [case]) and individuals from the
> US [American Caucasians (variable A=2 {first control group}), and African
> Americans (variable A=3 {second control group})].
>
> Variable B refers to country of origin for all individuals in the dataset
> (numeric value for each country represented). The analysis will be
> conducted by country of origin.
>
> For example, to compare characteristics of individuals from Peru (variable
> A=1 and variable B=Peru) with those of American Caucasians (control group
> 1) and with those of African Americans (control group 2), I would like to
> randomly select three Caucasians and three African Americans for each
> Peru-born case matched by year of birth (variable C).
>
> Any advice on how to execute this random selection with matching in SPSS
> would be greatly appreciated. Please let me know if there are further
> questions.
>
> Thanks much,
> SM
>
> On Mon, 13 Oct 2008, Gene Maguin wrote:
>
>> SM,
>>
>> Some things in your explanation don't make sense to me. So, is variable A
> is
>> the id number? What is variable B about? What values does it take on?
>> Variable C is the year, I got that. Also, define your use of the word
>> 'case'? Are you using the word to refer to a record in the file or are you
>> using the word to refer to a record that is positive for a condition
>> relative to a record that is negative for the condition, as in a case
>> control study? Please post back to the list.
>>
>>>> I have a dataset with cases in variable A, and I would like to randomly
>> select 3 controls (in variable B) matched to the year (variable C) that
>> each case was reported in.
>>
>>
>> Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

Richard Ristow

Re: Question about random selection with matching

In reply to this post by Swee May Cripe

At 12:58 PM 10/13/2008, Swee May Cripe wrote:

>I have a dataset with cases in variable [file?] A, and I would like
>to randomly select 3 controls (in variable [file?] B) matched to the
>year (variable C) that each case was reported in.

The easiest way is to sort by year (or by whatever set of match
variables you are using) and a random quantity.

Use AGGREGATE on file A to get the number of cases for each set of
matching variables. Merge that with file B by the matching variables.
Within each matching group, if you need k controls for the number of
cases in file A, select the first k (which, remember, are in random order).

I think King Douglas first posted this solution.

> SPSS allows for random selection, but I am not
>sure how to select controls at random matched to the year for each case.
>
>Appreciate any insight and assistance on how to do this in SPSS.
>
>Thanks much,
>SM Cripe
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD
>
>
>No virus found in this incoming message.
>Checked by AVG - http://www.avg.com
>Version: 8.0.173 / Virus Database: 270.8.0/1721 - Release Date:
>10/12/2008 12:00 PM