drawing samples for hundreds of workers

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

drawing samples for hundreds of workers

Raffe, Sydelle, SSA
I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ODM request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Norton, John
Hi Sydelle,

I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected.  If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file.  So, after aggregation, you will have a separate data file of nothing but unique IDs.

Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..)  Be sure to click on the option to delete all unselected cases.  Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file.  By default, that file will be in sort order by value of the ID.

Then, returning to the source file, sort it as well on the value of worker ID.  Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link.  Be sure to include the "selected_cases" variable.

The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell).  It's necessary to change the system missing values to zeros so that you can finally execute the case selection.  To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros.

Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1.

HTH,

John Norton
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 3:17 PM
To: [hidden email]
Subject: drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ODM request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Raffe, Sydelle, SSA
In reply to this post by Raffe, Sydelle, SSA
Yup.

-----Original Message-----
From: Norton, John [mailto:[hidden email]]
Sent: Monday, January 14, 2008 5:23 PM
To: Raffe, Sydelle, SSA
Subject: RE: drawing samples for hundreds of workers


So you want to do a random selection of cases *within worker ID*?
 
JN

  _____  

From: Raffe, Sydelle, SSA [mailto:[hidden email]]
Sent: Mon 1/14/2008 7:22 PM
To: Norton, John
Subject: RE: drawing samples for hundreds of workers



Thank you John. But, I want so much more. In my file, there are unique case records. These are apportioned to hundreds of different workers such that each worker has multiple cases.

We want to make a random selection of each workers cases. I don't think that's what I led you to understand.

-----Original Message-----
From: Norton, John [ mailto:[hidden email]]
Sent: Monday, January 14, 2008 1:39 PM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: RE: drawing samples for hundreds of workers


Hi Sydelle,

I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected.  If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file.  So, after aggregation, you will have a separate data file of nothing but unique IDs.

Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..)  Be sure to click on the option to delete all unselected cases.  Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file.  By default, that file will be in sort order by value of the ID.

Then, returning to the source file, sort it as well on the value of worker ID.  Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link.  Be sure to include the "selected_cases" variable.

The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell).  It's necessary to change the system missing values to zeros so that you can finally execute the case selection.  To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros.

Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1.

HTH,

John Norton
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [ mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 3:17 PM
To: [hidden email]
Subject: drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ODM request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Peck, Jon
This sounds like a job for the SPSS Complex Samples module.  Not only will that draw the sample for you allowing for complex multi-stage sampling, it will account for the sample design when you analyze the results.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 6:34 PM
To: [hidden email]
Subject: Re: [SPSSX-L] drawing samples for hundreds of workers

Yup.

-----Original Message-----
From: Norton, John [mailto:[hidden email]]
Sent: Monday, January 14, 2008 5:23 PM
To: Raffe, Sydelle, SSA
Subject: RE: drawing samples for hundreds of workers


So you want to do a random selection of cases *within worker ID*?

JN

  _____

From: Raffe, Sydelle, SSA [mailto:[hidden email]]
Sent: Mon 1/14/2008 7:22 PM
To: Norton, John
Subject: RE: drawing samples for hundreds of workers



Thank you John. But, I want so much more. In my file, there are unique case records. These are apportioned to hundreds of different workers such that each worker has multiple cases.

We want to make a random selection of each workers cases. I don't think that's what I led you to understand.

-----Original Message-----
From: Norton, John [ mailto:[hidden email]]
Sent: Monday, January 14, 2008 1:39 PM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: RE: drawing samples for hundreds of workers


Hi Sydelle,

I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected.  If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file.  So, after aggregation, you will have a separate data file of nothing but unique IDs.

Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..)  Be sure to click on the option to delete all unselected cases.  Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file.  By default, that file will be in sort order by value of the ID.

Then, returning to the source file, sort it as well on the value of worker ID.  Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link.  Be sure to include the "selected_cases" variable.

The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell).  It's necessary to change the system missing values to zeros so that you can finally execute the case selection.  To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros.

Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1.

HTH,

John Norton
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [ mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Monday, January 14, 2008 3:17 PM
To: [hidden email]
Subject: drawing samples for hundreds of workers

I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?

Sydelle Raffe, Alameda County Social Services Agency
Information Services Division, Office of Data Management
e:mail:  [hidden email]
phone: 510-271-9174     fax: 510-271-9107
If you have a request for information, please submit an ODM request form at:  https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Richard Ristow
In reply to this post by Raffe, Sydelle, SSA
At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Raffe, Sydelle, SSA
Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers


At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

King Douglas
So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
  BY WORKER
  /RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote: Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers


At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Raffe, Sydelle, SSA
Well, that certainly looks easy. Will give it a try. Thanks.

-----Original Message-----
From: King Douglas [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 11:16 AM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: Re: drawing samples for hundreds of workers


So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
  BY WORKER
  /RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote:

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers


At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Peck, Jon
The next question, though, is how are you going to analyze this sample, if that's the ultimate game.  You may need to take account of the sample design in the analysis phase.  I constructed a dataset that sounds something like  yours where the number of cases per worker was a random number uniform between 6 and 16 and the analysis variable was just a normal random number.

I drew 6 cases from each worker using CSPLAN/CSSAMPLE.  That produced a weight variable with weights between 1.0 and 2.5.
Using this sample weight, I calculated mean and standard deviation for a set of random normal numbers using both the DESCRIPTIVES procedure and the CSDESCRIPTIVES that is part of the Complex Samples option.

The means were identical, but the standard deviations differed by about 10%.

If I used the unweighted data, the difference is larger.

So at least you will probably want to calculate the weights associated with your sampling process, and you may want to go further.  Since I just used independent random numbers to generate the analysis variable, this case is probably the minimum differences one might find.  I would generally expect bigger differences with real data.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Tuesday, January 15, 2008 1:41 PM
To: [hidden email]
Subject: Re: [SPSSX-L] drawing samples for hundreds of workers

Well, that certainly looks easy. Will give it a try. Thanks.

-----Original Message-----
From: King Douglas [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 11:16 AM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: Re: drawing samples for hundreds of workers


So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
  BY WORKER
  /RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote:

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers


At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Raffe, Sydelle, SSA
Ah -- no analysis involved. State requires randomly selected cases for audits. But, thanks.

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 1:26 PM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: RE: Re: [SPSSX-L] drawing samples for hundreds of workers


The next question, though, is how are you going to analyze this sample, if that's the ultimate game.  You may need to take account of the sample design in the analysis phase.  I constructed a dataset that sounds something like  yours where the number of cases per worker was a random number uniform between 6 and 16 and the analysis variable was just a normal random number.

I drew 6 cases from each worker using CSPLAN/CSSAMPLE.  That produced a weight variable with weights between 1.0 and 2.5.
Using this sample weight, I calculated mean and standard deviation for a set of random normal numbers using both the DESCRIPTIVES procedure and the CSDESCRIPTIVES that is part of the Complex Samples option.

The means were identical, but the standard deviations differed by about 10%.

If I used the unweighted data, the difference is larger.

So at least you will probably want to calculate the weights associated with your sampling process, and you may want to go further.  Since I just used independent random numbers to generate the analysis variable, this case is probably the minimum differences one might find.  I would generally expect bigger differences with real data.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA
Sent: Tuesday, January 15, 2008 1:41 PM
To: [hidden email]
Subject: Re: [SPSSX-L] drawing samples for hundreds of workers

Well, that certainly looks easy. Will give it a try. Thanks.

-----Original Message-----
From: King Douglas [mailto:[hidden email]]
Sent: Tuesday, January 15, 2008 11:16 AM
To: Raffe, Sydelle, SSA; [hidden email]
Subject: Re: drawing samples for hundreds of workers


So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;

COMPUTE RANDNUM = UNIFORM(1).

SORT CASES BY WORKER RANDNUM.

RANK RANDNUM (A)
  BY WORKER
  /RANK INTO WORKRANK.

SELECT IF WORKRANK LE 6.

EXE.

"Raffe, Sydelle, SSA" <[hidden email]> wrote:

Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, January 14, 2008 9:59 PM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: Norton, John; Peck, Jon
Subject: Re: drawing samples for hundreds of workers


At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.

In your random selection, how many cases (records) do you want to
select for each worker? It doesn't have to be a fixed number; it can
be something like "1/3 of all the records." But, anyhow, you'll see
that there's nothing we can do without that.

You've seen Jon Peck's advice on the Complex Samples module. That's
probably the right idea. But if you don't have that,

If the file is sorted by worker ID (if not, sort it that way), I'd suggest

a) Use AGGREGATE to put the number of records for that worker, in all
records in the file.

b) Code some form of the 'k/n' algorithm in the transformation
language, coded to restart the selection for each worker. This
algorithm requires values 'k' and 'n' to work. 'n' is the number of
records for the worker, from step a); 'k' is the number of records to
be in the sample for that worker.

That's a quick sketch, and my description may be too brief to
implement from. But if you give us how you determine 'k', we can give
you a more complete answer.

-Good luck to you,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: drawing samples for hundreds of workers

Raffe, Sydelle, SSA
In reply to this post by Richard Ristow
This gives me much food for thought and learning. Thanks so much.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Wednesday, January 16, 2008 12:34 AM
To: Raffe, Sydelle, SSA; [hidden email]
Cc: King Douglas
Subject: Re: drawing samples for hundreds of workers


At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:

>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.


And at 01:40 PM 1/15/2008, Raffe, Sydelle, SSA wrote:

>Actually, we want 6 cases randomly selected for each worker.

King Douglas gave a nice implementation using SORT CASES and RANK. As
an alternative, here's the implementation with AGGREGATE and 'k/n'
logic. (It requires that the file be grouped, but not necessarily
sorted, by ID.) I'm selecting three records per worker.

|-----------------------------|---------------------------|
|Output Created               |16-JAN-2008 03:32:16       |
|-----------------------------|---------------------------|
ID   Fname    Lname           RecdDate

A35  Aaron    Aardvark     18-DEC-2004
A35  Aaron    Aardvark     25-MAY-2005
A35  Aaron    Aardvark     16-JUL-2005
A42  Bethany  Birkinwell   30-OCT-2004
A42  Bethany  Birkinwell   05-DEC-2004
A42  Bethany  Birkinwell   24-DEC-2004
A42  Bethany  Birkinwell   25-DEC-2004
C19  Charles  Cubbage      25-JUL-2003
C19  Charles  Cubbage      02-SEP-2003
C21  Dorothy  Dickens      14-NOV-2002
D98  Ellis    Etheridge    19-SEP-2000

Number of cases read:  11    Number of cases listed:  11


AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK=ID
    /NRecords 'Number of records for employee'=NU.

NUMERIC   #K #N (F3).

DO IF   $CASENUM EQ 1
      OR ID       NE LAG(ID).
.  COMPUTE #N = NRecords  /* Total records,    per worker  */.
.  COMPUTE #K = MIN(3,#N) /* Number to sample, per worker  */.
END IF.

.  /*-- PRINT  / 'Record ' ID Fname Lname RecdDate ': ' /*-*/
    /*--          'K=' #K ', N=' #N                      /*-*/.

COMPUTE #Take_It = RV.BERNOULLI(#K/#N).
COMPUTE #K = #K - #Take_It.
COMPUTE #N = #N - 1.

.  /*-- PRINT  / '       TAKE=' #Take_It                /*-*/.

SELECT IF #Take_It.

.  /*-- EXECUTE                                         /*-*/.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |16-JAN-2008 03:32:17       |
|-----------------------------|---------------------------|
ID   Fname    Lname           RecdDate NRecords

A35  Aaron    Aardvark     18-DEC-2004        3
A35  Aaron    Aardvark     25-MAY-2005        3
A35  Aaron    Aardvark     16-JUL-2005        3
A42  Bethany  Birkinwell   30-OCT-2004        4
A42  Bethany  Birkinwell   05-DEC-2004        4
A42  Bethany  Birkinwell   25-DEC-2004        4
C19  Charles  Cubbage      25-JUL-2003        2
C19  Charles  Cubbage      02-SEP-2003        2
C21  Dorothy  Dickens      14-NOV-2002        1
D98  Ellis    Etheridge    19-SEP-2000        1

Number of cases read:  10    Number of cases listed:  10
===================
APPENDIX: Test data
===================
*  ................................................................. .
*  .................   Test data               ..................... .
SET RNG = MT       /* 'Mersenne twister' random number generator  */ .
SET MTINDEX = 3605 /*  Providence, RI telephone book              */ .


INPUT PROGRAM.
.  DATA LIST LIST
           /ID Fname Lname
           (A4,A8,   A12).
.  LEAVE   ID Fname Lname.
.  NUMERIC    RecdDate (DATE11).
.  LEAVE      RecdDate.
.  COMPUTE    RecdDate=RV.UNIFORM(DATE.MDY(01,01,2000),
                                  DATE.MDY(01,01,2005)).
.  COMPUTE    RecdDate=XDATE.DATE(RecdDate).

.  NUMERIC    #NRecrds #RecdNum (F3).
.  COMPUTE    #NRecrds = TRUNC(RV.UNIFORM(1,5)).
.  LOOP       #RecdNum = 1 TO #NRecrds.
.     COMPUTE RecdDate = RecdDate + RV.EXP(1/TIME.DAYS(45)).
.     COMPUTE RecdDate=XDATE.DATE(RecdDate).
.     END CASE.
.  END LOOP.
END INPUT PROGRAM.

BEGIN DATA
A35  Aaron   Aardvark
A42  Bethany Birkinwell
C19  Charles Cubbage
C21  Dorothy Dickens
D98  Ellis   Etheridge
END DATA.

LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD