SPSSX Discussion

drawing samples for hundreds of workers

Classic

List

Threaded

11 messages Options

Raffe, Sydelle, SSA

drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail: [hidden email]
phone: 510-271-9174 fax: 510-271-9107
If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Norton, John

Re: drawing samples for hundreds of workers

Hi Sydelle,

I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected. If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file. So, after aggregation, you will have a separate data file of nothing but unique IDs.

Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..) Be sure to click on the option to delete all unselected cases. Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file. By default, that file will be in sort order by value of the ID.

Then, returning to the source file, sort it as well on the value of worker ID. Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link. Be sure to include the "selected_cases" variable.

The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell). It's necessary to change the system missing values to zeros so that you can finally execute the case selection. To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros.

Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1.

HTH,

John Norton
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 3:17 PM
To: [hidden email]
Subject: drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail: [hidden email]
phone: 510-271-9174 fax: 510-271-9107
If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Raffe, Sydelle, SSA

Re: drawing samples for hundreds of workers

In reply to this post by Raffe, Sydelle, SSA

Yup.

-----Original Message-----
From: Norton, John [mailto:[hidden email]]
Sent: Monday, January 14, 2008 5:23 PM
To: Raffe, Sydelle, SSA
Subject: RE: drawing samples for hundreds of workers

So you want to do a random selection of cases *within worker ID*?

JN

_____

From: Raffe, Sydelle, SSA [mailto:[hidden email]]
Sent: Mon 1/14/2008 7:22 PM
To: Norton, John
Subject: RE: drawing samples for hundreds of workers

Thank you John. But, I want so much more. In my file, there are unique case records. These are apportioned to hundreds of different workers such that each worker has multiple cases.

We want to make a random selection of each workers cases. I don't think that's what I led you to understand.

-----Original Message-----
From: Norton, John [ mailto:[hidden email]]
Sent: Monday, January 14, 2008 1:39 PM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: RE: drawing samples for hundreds of workers

Hi Sydelle,

I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected. If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file. So, after aggregation, you will have a separate data file of nothing but unique IDs.

Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..) Be sure to click on the option to delete all unselected cases. Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file. By default, that file will be in sort order by value of the ID.

Then, returning to the source file, sort it as well on the value of worker ID. Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link. Be sure to include the "selected_cases" variable.

The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell). It's necessary to change the system missing values to zeros so that you can finally execute the case selection. To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros.

Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1.

HTH,

John Norton
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [ mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 3:17 PM
To: [hidden email]
Subject: drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail: [hidden email]
phone: 510-271-9174 fax: 510-271-9107
If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: drawing samples for hundreds of workers

This sounds like a job for the SPSS Complex Samples module. Not only will that draw the sample for you allowing for complex multi-stage sampling, it will account for the sample design when you analyze the results.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 6:34 PM
To: [hidden email]
Subject: Re: [SPSSX-L] drawing samples for hundreds of workers

Yup.

-----Original Message-----
From: Norton, John [mailto:[hidden email]]
Sent: Monday, January 14, 2008 5:23 PM
To: Raffe, Sydelle, SSA
Subject: RE: drawing samples for hundreds of workers

So you want to do a random selection of cases *within worker ID*?

JN

_____

From: Raffe, Sydelle, SSA [mailto:[hidden email]]
Sent: Mon 1/14/2008 7:22 PM
To: Norton, John
Subject: RE: drawing samples for hundreds of workers

Thank you John. But, I want so much more. In my file, there are unique case records. These are apportioned to hundreds of different workers such that each worker has multiple cases.

We want to make a random selection of each workers cases. I don't think that's what I led you to understand.

-----Original Message-----
From: Norton, John [ mailto:[hidden email]]
Sent: Monday, January 14, 2008 1:39 PM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: RE: drawing samples for hundreds of workers

Hi Sydelle,

I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected. If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file. So, after aggregation, you will have a separate data file of nothing but unique IDs.

Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..) Be sure to click on the option to delete all unselected cases. Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file. By default, that file will be in sort order by value of the ID.

Then, returning to the source file, sort it as well on the value of worker ID. Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link. Be sure to include the "selected_cases" variable.

The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell). It's necessary to change the system missing values to zeros so that you can finally execute the case selection. To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros.

Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1.

HTH,

John Norton
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [ mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 3:17 PM
To: [hidden email]
Subject: drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail: [hidden email]
phone: 510-271-9174 fax: 510-271-9107
If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: drawing samples for hundreds of workers

In reply to this post by Raffe, Sydelle, SSA

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Raffe, Sydelle, SSA

Re: drawing samples for hundreds of workers

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

King Douglas

Re: drawing samples for hundreds of workers

So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
BY WORKER
/RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote: Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Raffe, Sydelle, SSA

Re: drawing samples for hundreds of workers

Well, that certainly looks easy. Will give it a try. Thanks.

-----Original Message-----
From: King Douglas [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 11:16 AM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: Re: drawing samples for hundreds of workers

So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
BY WORKER
/RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote:

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: drawing samples for hundreds of workers

The next question, though, is how are you going to analyze this sample, if that's the ultimate game. You may need to take account of the sample design in the analysis phase. I constructed a dataset that sounds something like yours where the number of cases per worker was a random number uniform between 6 and 16 and the analysis variable was just a normal random number.

I drew 6 cases from each worker using CSPLAN/CSSAMPLE. That produced a weight variable with weights between 1.0 and 2.5.
Using this sample weight, I calculated mean and standard deviation for a set of random normal numbers using both the DESCRIPTIVES procedure and the CSDESCRIPTIVES that is part of the Complex Samples option.

The means were identical, but the standard deviations differed by about 10%.

If I used the unweighted data, the difference is larger.

So at least you will probably want to calculate the weights associated with your sampling process, and you may want to go further. Since I just used independent random numbers to generate the analysis variable, this case is probably the minimum differences one might find. I would generally expect bigger differences with real data.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Tuesday, January 15, 2008 1:41 PM
To: [hidden email]
Subject: Re: [SPSSX-L] drawing samples for hundreds of workers

Well, that certainly looks easy. Will give it a try. Thanks.

-----Original Message-----
From: King Douglas [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 11:16 AM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: Re: drawing samples for hundreds of workers

So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
BY WORKER
/RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote:

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Raffe, Sydelle, SSA

Re: drawing samples for hundreds of workers

Ah -- no analysis involved. State requires randomly selected cases for audits. But, thanks.

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 1:26 PM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: RE: Re: [SPSSX-L] drawing samples for hundreds of workers

The next question, though, is how are you going to analyze this sample, if that's the ultimate game. You may need to take account of the sample design in the analysis phase. I constructed a dataset that sounds something like yours where the number of cases per worker was a random number uniform between 6 and 16 and the analysis variable was just a normal random number.

I drew 6 cases from each worker using CSPLAN/CSSAMPLE. That produced a weight variable with weights between 1.0 and 2.5.
Using this sample weight, I calculated mean and standard deviation for a set of random normal numbers using both the DESCRIPTIVES procedure and the CSDESCRIPTIVES that is part of the Complex Samples option.

The means were identical, but the standard deviations differed by about 10%.

If I used the unweighted data, the difference is larger.

So at least you will probably want to calculate the weights associated with your sampling process, and you may want to go further. Since I just used independent random numbers to generate the analysis variable, this case is probably the minimum differences one might find. I would generally expect bigger differences with real data.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Tuesday, January 15, 2008 1:41 PM
To: [hidden email]
Subject: Re: [SPSSX-L] drawing samples for hundreds of workers

Well, that certainly looks easy. Will give it a try. Thanks.

-----Original Message-----
From: King Douglas [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 11:16 AM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: Re: drawing samples for hundreds of workers

So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
BY WORKER
/RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote:

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Raffe, Sydelle, SSA

Re: drawing samples for hundreds of workers

In reply to this post by Richard Ristow

This gives me much food for thought and learning. Thanks so much.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Wednesday, January 16, 2008 12:34 AM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: King Douglas
Subject: Re: drawing samples for hundreds of workers

At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

And at 01:40 PM 1/15/2008, Raffe, Sydelle, SSA wrote:

>Actually, we want 6 cases randomly selected for each worker.

King Douglas gave a nice implementation using SORT CASES and RANK. As
an alternative, here's the implementation with AGGREGATE and 'k/n'
logic. (It requires that the file be grouped, but not necessarily
sorted, by ID.) I'm selecting three records per worker.

|-----------------------------|---------------------------|
|Output Created |16-JAN-2008 03:32:16 |
|-----------------------------|---------------------------|
ID Fname Lname RecdDate

A35 Aaron Aardvark 18-DEC-2004
A35 Aaron Aardvark 25-MAY-2005
A35 Aaron Aardvark 16-JUL-2005
A42 Bethany Birkinwell 30-OCT-2004
A42 Bethany Birkinwell 05-DEC-2004
A42 Bethany Birkinwell 24-DEC-2004
A42 Bethany Birkinwell 25-DEC-2004
C19 Charles Cubbage 25-JUL-2003
C19 Charles Cubbage 02-SEP-2003
C21 Dorothy Dickens 14-NOV-2002
D98 Ellis Etheridge 19-SEP-2000

Number of cases read: 11 Number of cases listed: 11

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/NRecords 'Number of records for employee'=NU.

NUMERIC #K #N (F3).

DO IF $CASENUM EQ 1
OR ID NE LAG(ID).
. COMPUTE #N = NRecords /* Total records, per worker */.
. COMPUTE #K = MIN(3,#N) /* Number to sample, per worker */.
END IF.

. /*-- PRINT / 'Record ' ID Fname Lname RecdDate ': ' /*-*/
/*-- 'K=' #K ', N=' #N /*-*/.

COMPUTE #Take_It = RV.BERNOULLI(#K/#N).
COMPUTE #K = #K - #Take_It.
COMPUTE #N = #N - 1.

. /*-- PRINT / ' TAKE=' #Take_It /*-*/.

SELECT IF #Take_It.

. /*-- EXECUTE /*-*/.

LIST.

List
|-----------------------------|---------------------------|
|Output Created |16-JAN-2008 03:32:17 |
|-----------------------------|---------------------------|
ID Fname Lname RecdDate NRecords

A35 Aaron Aardvark 18-DEC-2004 3
A35 Aaron Aardvark 25-MAY-2005 3
A35 Aaron Aardvark 16-JUL-2005 3
A42 Bethany Birkinwell 30-OCT-2004 4
A42 Bethany Birkinwell 05-DEC-2004 4
A42 Bethany Birkinwell 25-DEC-2004 4
C19 Charles Cubbage 25-JUL-2003 2
C19 Charles Cubbage 02-SEP-2003 2
C21 Dorothy Dickens 14-NOV-2002 1
D98 Ellis Etheridge 19-SEP-2000 1

Number of cases read: 10 Number of cases listed: 10
===================
APPENDIX: Test data
===================
* ................................................................. .
* ................. Test data ..................... .
SET RNG = MT /* 'Mersenne twister' random number generator */ .
SET MTINDEX = 3605 /* Providence, RI telephone book */ .

INPUT PROGRAM.
. DATA LIST LIST
/ID Fname Lname
(A4,A8, A12).
. LEAVE ID Fname Lname.
. NUMERIC RecdDate (DATE11).
. LEAVE RecdDate.
. COMPUTE RecdDate=RV.UNIFORM(DATE.MDY(01,01,2000),
DATE.MDY(01,01,2005)).
. COMPUTE RecdDate=XDATE.DATE(RecdDate).

. NUMERIC #NRecrds #RecdNum (F3).
. COMPUTE #NRecrds = TRUNC(RV.UNIFORM(1,5)).
. LOOP #RecdNum = 1 TO #NRecrds.
. COMPUTE RecdDate = RecdDate + RV.EXP(1/TIME.DAYS(45)).
. COMPUTE RecdDate=XDATE.DATE(RecdDate).
. END CASE.
. END LOOP.
END INPUT PROGRAM.

BEGIN DATA
A35 Aaron Aardvark
A42 Bethany Birkinwell
C19 Charles Cubbage
C21 Dorothy Dickens
D98 Ellis Etheridge
END DATA.

LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD