|
I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done?
Sydelle Raffe, Alameda County Social Services Agency Information Services Division, Office of Data Management e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Sydelle,
I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected. If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file. So, after aggregation, you will have a separate data file of nothing but unique IDs. Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..) Be sure to click on the option to delete all unselected cases. Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file. By default, that file will be in sort order by value of the ID. Then, returning to the source file, sort it as well on the value of worker ID. Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link. Be sure to include the "selected_cases" variable. The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell). It's necessary to change the system missing values to zeros so that you can finally execute the case selection. To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros. Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1. HTH, John Norton SPSS Inc. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Monday, January 14, 2008 3:17 PM To: [hidden email] Subject: drawing samples for hundreds of workers I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done? Sydelle Raffe, Alameda County Social Services Agency Information Services Division, Office of Data Management e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Raffe, Sydelle, SSA
Yup.
-----Original Message----- From: Norton, John [mailto:[hidden email]] Sent: Monday, January 14, 2008 5:23 PM To: Raffe, Sydelle, SSA Subject: RE: drawing samples for hundreds of workers So you want to do a random selection of cases *within worker ID*? JN _____ From: Raffe, Sydelle, SSA [mailto:[hidden email]] Sent: Mon 1/14/2008 7:22 PM To: Norton, John Subject: RE: drawing samples for hundreds of workers Thank you John. But, I want so much more. In my file, there are unique case records. These are apportioned to hundreds of different workers such that each worker has multiple cases. We want to make a random selection of each workers cases. I don't think that's what I led you to understand. -----Original Message----- From: Norton, John [ mailto:[hidden email]] Sent: Monday, January 14, 2008 1:39 PM To: Raffe, Sydelle, SSA; [hidden email] Subject: RE: drawing samples for hundreds of workers Hi Sydelle, I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected. If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file. So, after aggregation, you will have a separate data file of nothing but unique IDs. Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..) Be sure to click on the option to delete all unselected cases. Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file. By default, that file will be in sort order by value of the ID. Then, returning to the source file, sort it as well on the value of worker ID. Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link. Be sure to include the "selected_cases" variable. The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell). It's necessary to change the system missing values to zeros so that you can finally execute the case selection. To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros. Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1. HTH, John Norton SPSS Inc. -----Original Message----- From: SPSSX(r) Discussion [ mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Monday, January 14, 2008 3:17 PM To: [hidden email] Subject: drawing samples for hundreds of workers I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done? Sydelle Raffe, Alameda County Social Services Agency Information Services Division, Office of Data Management e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
This sounds like a job for the SPSS Complex Samples module. Not only will that draw the sample for you allowing for complex multi-stage sampling, it will account for the sample design when you analyze the results.
HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Monday, January 14, 2008 6:34 PM To: [hidden email] Subject: Re: [SPSSX-L] drawing samples for hundreds of workers Yup. -----Original Message----- From: Norton, John [mailto:[hidden email]] Sent: Monday, January 14, 2008 5:23 PM To: Raffe, Sydelle, SSA Subject: RE: drawing samples for hundreds of workers So you want to do a random selection of cases *within worker ID*? JN _____ From: Raffe, Sydelle, SSA [mailto:[hidden email]] Sent: Mon 1/14/2008 7:22 PM To: Norton, John Subject: RE: drawing samples for hundreds of workers Thank you John. But, I want so much more. In my file, there are unique case records. These are apportioned to hundreds of different workers such that each worker has multiple cases. We want to make a random selection of each workers cases. I don't think that's what I led you to understand. -----Original Message----- From: Norton, John [ mailto:[hidden email]] Sent: Monday, January 14, 2008 1:39 PM To: Raffe, Sydelle, SSA; [hidden email] Subject: RE: drawing samples for hundreds of workers Hi Sydelle, I'll assume that your co-worker is interested in a random selection of workers, rather than of cases in the data file, such that when a worker is selected, all his or her records are selected. If this assumption is not incorrect then one way to accomplish this is to aggregate the data based on worker ID, saving only that variable to an external data file. So, after aggregation, you will have a separate data file of nothing but unique IDs. Then, using that aggregated file, you can execute a random selection of cases using the capabilities in the random selection engine within the Select Cases Wizard (under Data > Select Cases..) Be sure to click on the option to delete all unselected cases. Now create a constant variable with a value of 1 and call it something like "selected_case" and then save the data file. By default, that file will be in sort order by value of the ID. Then, returning to the source file, sort it as well on the value of worker ID. Finally, you can merge the source file with the file just created (under Data > Merge Files > Add Variables...) and use worker ID as the link. Be sure to include the "selected_cases" variable. The resulting file will now have the new variable (at the far right of the data file) with values of 1 and system missing (represented by a "." in the cell). It's necessary to change the system missing values to zeros so that you can finally execute the case selection. To replace the system missing values, use the Recode engine (under Transform > Recode into same variable) and replace the system missing values with zeros. Finally, you can run another case selection to select all cases where the "select_case" variable has a value of 1. HTH, John Norton SPSS Inc. -----Original Message----- From: SPSSX(r) Discussion [ mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Monday, January 14, 2008 3:17 PM To: [hidden email] Subject: drawing samples for hundreds of workers I have a co-worker who needs to identify a random sample of cases based on worker number in a file. There are many hundreds of workers who a have a varied number of cases. How could this be done? Sydelle Raffe, Alameda County Social Services Agency Information Services Division, Office of Data Management e:mail: [hidden email] phone: 510-271-9174 fax: 510-271-9107 If you have a request for information, please submit an ODM request form at: https://alamedasocialservices.org/staff/support_services/statistics_and_reports/odm/index.cfm ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Raffe, Sydelle, SSA
At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:
>In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. In your random selection, how many cases (records) do you want to select for each worker? It doesn't have to be a fixed number; it can be something like "1/3 of all the records." But, anyhow, you'll see that there's nothing we can do without that. You've seen Jon Peck's advice on the Complex Samples module. That's probably the right idea. But if you don't have that, If the file is sorted by worker ID (if not, sort it that way), I'd suggest a) Use AGGREGATE to put the number of records for that worker, in all records in the file. b) Code some form of the 'k/n' algorithm in the transformation language, coded to restart the selection for each worker. This algorithm requires values 'k' and 'n' to work. 'n' is the number of records for the worker, from step a); 'k' is the number of records to be in the sample for that worker. That's a quick sketch, and my description may be too brief to implement from. But if you give us how you determine 'k', we can give you a more complete answer. -Good luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Actually, we want 6 cases randomly selected for each worker. I so appreciate your help.
-----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Monday, January 14, 2008 9:59 PM To: Raffe, Sydelle, SSA; [hidden email] Cc: Norton, John; Peck, Jon Subject: Re: drawing samples for hundreds of workers At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote: >In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. In your random selection, how many cases (records) do you want to select for each worker? It doesn't have to be a fixed number; it can be something like "1/3 of all the records." But, anyhow, you'll see that there's nothing we can do without that. You've seen Jon Peck's advice on the Complex Samples module. That's probably the right idea. But if you don't have that, If the file is sorted by worker ID (if not, sort it that way), I'd suggest a) Use AGGREGATE to put the number of records for that worker, in all records in the file. b) Code some form of the 'k/n' algorithm in the transformation language, coded to restart the selection for each worker. This algorithm requires values 'k' and 'n' to work. 'n' is the number of records for the worker, from step a); 'k' is the number of records to be in the sample for that worker. That's a quick sketch, and my description may be too brief to implement from. But if you give us how you determine 'k', we can give you a more complete answer. -Good luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this;
COMPUTE RANDNUM = UNIFORM(1). SORT CASES BY WORKER RANDNUM. RANK RANDNUM (A) BY WORKER /RANK INTO WORKRANK. SELECT IF WORKRANK LE 6. EXE. "Raffe, Sydelle, SSA" <[hidden email]> wrote: Actually, we want 6 cases randomly selected for each worker. I so appreciate your help. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Monday, January 14, 2008 9:59 PM To: Raffe, Sydelle, SSA; [hidden email] Cc: Norton, John; Peck, Jon Subject: Re: drawing samples for hundreds of workers At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote: >In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. In your random selection, how many cases (records) do you want to select for each worker? It doesn't have to be a fixed number; it can be something like "1/3 of all the records." But, anyhow, you'll see that there's nothing we can do without that. You've seen Jon Peck's advice on the Complex Samples module. That's probably the right idea. But if you don't have that, If the file is sorted by worker ID (if not, sort it that way), I'd suggest a) Use AGGREGATE to put the number of records for that worker, in all records in the file. b) Code some form of the 'k/n' algorithm in the transformation language, coded to restart the selection for each worker. This algorithm requires values 'k' and 'n' to work. 'n' is the number of records for the worker, from step a); 'k' is the number of records to be in the sample for that worker. That's a quick sketch, and my description may be too brief to implement from. But if you give us how you determine 'k', we can give you a more complete answer. -Good luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Well, that certainly looks easy. Will give it a try. Thanks.
-----Original Message----- From: King Douglas [mailto:[hidden email]] Sent: Tuesday, January 15, 2008 11:16 AM To: Raffe, Sydelle, SSA; [hidden email] Subject: Re: drawing samples for hundreds of workers So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this; COMPUTE RANDNUM = UNIFORM(1). SORT CASES BY WORKER RANDNUM. RANK RANDNUM (A) BY WORKER /RANK INTO WORKRANK. SELECT IF WORKRANK LE 6. EXE. "Raffe, Sydelle, SSA" <[hidden email]> wrote: Actually, we want 6 cases randomly selected for each worker. I so appreciate your help. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Monday, January 14, 2008 9:59 PM To: Raffe, Sydelle, SSA; [hidden email] Cc: Norton, John; Peck, Jon Subject: Re: drawing samples for hundreds of workers At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote: >In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. In your random selection, how many cases (records) do you want to select for each worker? It doesn't have to be a fixed number; it can be something like "1/3 of all the records." But, anyhow, you'll see that there's nothing we can do without that. You've seen Jon Peck's advice on the Complex Samples module. That's probably the right idea. But if you don't have that, If the file is sorted by worker ID (if not, sort it that way), I'd suggest a) Use AGGREGATE to put the number of records for that worker, in all records in the file. b) Code some form of the 'k/n' algorithm in the transformation language, coded to restart the selection for each worker. This algorithm requires values 'k' and 'n' to work. 'n' is the number of records for the worker, from step a); 'k' is the number of records to be in the sample for that worker. That's a quick sketch, and my description may be too brief to implement from. But if you give us how you determine 'k', we can give you a more complete answer. -Good luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
The next question, though, is how are you going to analyze this sample, if that's the ultimate game. You may need to take account of the sample design in the analysis phase. I constructed a dataset that sounds something like yours where the number of cases per worker was a random number uniform between 6 and 16 and the analysis variable was just a normal random number.
I drew 6 cases from each worker using CSPLAN/CSSAMPLE. That produced a weight variable with weights between 1.0 and 2.5. Using this sample weight, I calculated mean and standard deviation for a set of random normal numbers using both the DESCRIPTIVES procedure and the CSDESCRIPTIVES that is part of the Complex Samples option. The means were identical, but the standard deviations differed by about 10%. If I used the unweighted data, the difference is larger. So at least you will probably want to calculate the weights associated with your sampling process, and you may want to go further. Since I just used independent random numbers to generate the analysis variable, this case is probably the minimum differences one might find. I would generally expect bigger differences with real data. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Tuesday, January 15, 2008 1:41 PM To: [hidden email] Subject: Re: [SPSSX-L] drawing samples for hundreds of workers Well, that certainly looks easy. Will give it a try. Thanks. -----Original Message----- From: King Douglas [mailto:[hidden email]] Sent: Tuesday, January 15, 2008 11:16 AM To: Raffe, Sydelle, SSA; [hidden email] Subject: Re: drawing samples for hundreds of workers So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this; COMPUTE RANDNUM = UNIFORM(1). SORT CASES BY WORKER RANDNUM. RANK RANDNUM (A) BY WORKER /RANK INTO WORKRANK. SELECT IF WORKRANK LE 6. EXE. "Raffe, Sydelle, SSA" <[hidden email]> wrote: Actually, we want 6 cases randomly selected for each worker. I so appreciate your help. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Monday, January 14, 2008 9:59 PM To: Raffe, Sydelle, SSA; [hidden email] Cc: Norton, John; Peck, Jon Subject: Re: drawing samples for hundreds of workers At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote: >In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. In your random selection, how many cases (records) do you want to select for each worker? It doesn't have to be a fixed number; it can be something like "1/3 of all the records." But, anyhow, you'll see that there's nothing we can do without that. You've seen Jon Peck's advice on the Complex Samples module. That's probably the right idea. But if you don't have that, If the file is sorted by worker ID (if not, sort it that way), I'd suggest a) Use AGGREGATE to put the number of records for that worker, in all records in the file. b) Code some form of the 'k/n' algorithm in the transformation language, coded to restart the selection for each worker. This algorithm requires values 'k' and 'n' to work. 'n' is the number of records for the worker, from step a); 'k' is the number of records to be in the sample for that worker. That's a quick sketch, and my description may be too brief to implement from. But if you give us how you determine 'k', we can give you a more complete answer. -Good luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Ah -- no analysis involved. State requires randomly selected cases for audits. But, thanks.
-----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, January 15, 2008 1:26 PM To: Raffe, Sydelle, SSA; [hidden email] Subject: RE: Re: [SPSSX-L] drawing samples for hundreds of workers The next question, though, is how are you going to analyze this sample, if that's the ultimate game. You may need to take account of the sample design in the analysis phase. I constructed a dataset that sounds something like yours where the number of cases per worker was a random number uniform between 6 and 16 and the analysis variable was just a normal random number. I drew 6 cases from each worker using CSPLAN/CSSAMPLE. That produced a weight variable with weights between 1.0 and 2.5. Using this sample weight, I calculated mean and standard deviation for a set of random normal numbers using both the DESCRIPTIVES procedure and the CSDESCRIPTIVES that is part of the Complex Samples option. The means were identical, but the standard deviations differed by about 10%. If I used the unweighted data, the difference is larger. So at least you will probably want to calculate the weights associated with your sampling process, and you may want to go further. Since I just used independent random numbers to generate the analysis variable, this case is probably the minimum differences one might find. I would generally expect bigger differences with real data. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Raffe, Sydelle, SSA Sent: Tuesday, January 15, 2008 1:41 PM To: [hidden email] Subject: Re: [SPSSX-L] drawing samples for hundreds of workers Well, that certainly looks easy. Will give it a try. Thanks. -----Original Message----- From: King Douglas [mailto:[hidden email]] Sent: Tuesday, January 15, 2008 11:16 AM To: Raffe, Sydelle, SSA; [hidden email] Subject: Re: drawing samples for hundreds of workers So long as your worker id is numeric (otherwise use Autorecode or some such), do something like this; COMPUTE RANDNUM = UNIFORM(1). SORT CASES BY WORKER RANDNUM. RANK RANDNUM (A) BY WORKER /RANK INTO WORKRANK. SELECT IF WORKRANK LE 6. EXE. "Raffe, Sydelle, SSA" <[hidden email]> wrote: Actually, we want 6 cases randomly selected for each worker. I so appreciate your help. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Monday, January 14, 2008 9:59 PM To: Raffe, Sydelle, SSA; [hidden email] Cc: Norton, John; Peck, Jon Subject: Re: drawing samples for hundreds of workers At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote: >In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. In your random selection, how many cases (records) do you want to select for each worker? It doesn't have to be a fixed number; it can be something like "1/3 of all the records." But, anyhow, you'll see that there's nothing we can do without that. You've seen Jon Peck's advice on the Complex Samples module. That's probably the right idea. But if you don't have that, If the file is sorted by worker ID (if not, sort it that way), I'd suggest a) Use AGGREGATE to put the number of records for that worker, in all records in the file. b) Code some form of the 'k/n' algorithm in the transformation language, coded to restart the selection for each worker. This algorithm requires values 'k' and 'n' to work. 'n' is the number of records for the worker, from step a); 'k' is the number of records to be in the sample for that worker. That's a quick sketch, and my description may be too brief to implement from. But if you give us how you determine 'k', we can give you a more complete answer. -Good luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Richard Ristow
This gives me much food for thought and learning. Thanks so much.
-----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Wednesday, January 16, 2008 12:34 AM To: Raffe, Sydelle, SSA; [hidden email] Cc: King Douglas Subject: Re: drawing samples for hundreds of workers At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote: >In my file, there are unique case records. These are apportioned to >hundreds of different workers such that each worker has multiple cases. > >We want to make a random selection of each workers cases. I don't >think that's what I led [John Norton] to understand. And at 01:40 PM 1/15/2008, Raffe, Sydelle, SSA wrote: >Actually, we want 6 cases randomly selected for each worker. King Douglas gave a nice implementation using SORT CASES and RANK. As an alternative, here's the implementation with AGGREGATE and 'k/n' logic. (It requires that the file be grouped, but not necessarily sorted, by ID.) I'm selecting three records per worker. |-----------------------------|---------------------------| |Output Created |16-JAN-2008 03:32:16 | |-----------------------------|---------------------------| ID Fname Lname RecdDate A35 Aaron Aardvark 18-DEC-2004 A35 Aaron Aardvark 25-MAY-2005 A35 Aaron Aardvark 16-JUL-2005 A42 Bethany Birkinwell 30-OCT-2004 A42 Bethany Birkinwell 05-DEC-2004 A42 Bethany Birkinwell 24-DEC-2004 A42 Bethany Birkinwell 25-DEC-2004 C19 Charles Cubbage 25-JUL-2003 C19 Charles Cubbage 02-SEP-2003 C21 Dorothy Dickens 14-NOV-2002 D98 Ellis Etheridge 19-SEP-2000 Number of cases read: 11 Number of cases listed: 11 AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /NRecords 'Number of records for employee'=NU. NUMERIC #K #N (F3). DO IF $CASENUM EQ 1 OR ID NE LAG(ID). . COMPUTE #N = NRecords /* Total records, per worker */. . COMPUTE #K = MIN(3,#N) /* Number to sample, per worker */. END IF. . /*-- PRINT / 'Record ' ID Fname Lname RecdDate ': ' /*-*/ /*-- 'K=' #K ', N=' #N /*-*/. COMPUTE #Take_It = RV.BERNOULLI(#K/#N). COMPUTE #K = #K - #Take_It. COMPUTE #N = #N - 1. . /*-- PRINT / ' TAKE=' #Take_It /*-*/. SELECT IF #Take_It. . /*-- EXECUTE /*-*/. LIST. List |-----------------------------|---------------------------| |Output Created |16-JAN-2008 03:32:17 | |-----------------------------|---------------------------| ID Fname Lname RecdDate NRecords A35 Aaron Aardvark 18-DEC-2004 3 A35 Aaron Aardvark 25-MAY-2005 3 A35 Aaron Aardvark 16-JUL-2005 3 A42 Bethany Birkinwell 30-OCT-2004 4 A42 Bethany Birkinwell 05-DEC-2004 4 A42 Bethany Birkinwell 25-DEC-2004 4 C19 Charles Cubbage 25-JUL-2003 2 C19 Charles Cubbage 02-SEP-2003 2 C21 Dorothy Dickens 14-NOV-2002 1 D98 Ellis Etheridge 19-SEP-2000 1 Number of cases read: 10 Number of cases listed: 10 =================== APPENDIX: Test data =================== * ................................................................. . * ................. Test data ..................... . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 3605 /* Providence, RI telephone book */ . INPUT PROGRAM. . DATA LIST LIST /ID Fname Lname (A4,A8, A12). . LEAVE ID Fname Lname. . NUMERIC RecdDate (DATE11). . LEAVE RecdDate. . COMPUTE RecdDate=RV.UNIFORM(DATE.MDY(01,01,2000), DATE.MDY(01,01,2005)). . COMPUTE RecdDate=XDATE.DATE(RecdDate). . NUMERIC #NRecrds #RecdNum (F3). . COMPUTE #NRecrds = TRUNC(RV.UNIFORM(1,5)). . LOOP #RecdNum = 1 TO #NRecrds. . COMPUTE RecdDate = RecdDate + RV.EXP(1/TIME.DAYS(45)). . COMPUTE RecdDate=XDATE.DATE(RecdDate). . END CASE. . END LOOP. END INPUT PROGRAM. BEGIN DATA A35 Aaron Aardvark A42 Bethany Birkinwell C19 Charles Cubbage C21 Dorothy Dickens D98 Ellis Etheridge END DATA. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
