|
Dear List,
The background is that I have a list of open cases for probation officers and I want to randomly pick 10 of their cases for review. For the probation officers that have less than 10 cases open, I would like to select all of their cases. There are 20 probation officers and 395 open cases. The caseload varies from 1 to 35 open cases. I think I have modified the syntax in the links below correctly, however, the correct number of cases is not always returned regardless of whether the probation officer has more or less than 10 cases. For example, there is a probation officer that has 35 cases and after I run the syntax below, sometimes I have 10 cases (correct) and other times I have 8 or 9 etc.(incorrect). I'm wondering if it's possible that I either made a mistake in adjusting the code or perhaps there is some issue with it that someone could pinpoint. The syntax that I'm adjusting/using was posted to the list here: http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0803&L=spssx-l&P=R13320&m=59416 http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0803&L=spssx-l&P=R13761&m=59416 The only modifications that I have made are changing the variable names and the minimum number of records from 127 in the posted syntax to 10. What is additionally worrisome to me is that when I run the code on the sample data below, it works correctly. However, when I run the same code on my real data, it doesn't seem to work properly. I would be happy to send a copy of my 395 cases to anyone off-list who is interested in helping me figure this issue out. Any help would be greatly appreciated! -Ari *Sample Data. DATA LIST LIST /Patient_Number (A9) Officer_ID (A7) Program (A7). BEGIN DATA 041949 006415 PROB 045284 006415 PROB 046107 006415 PROB 047019 006415 PROB 048501 006415 PROB 049087 006415 PROB 052716 006415 PROB 056991 006415 PROB 057073 006415 PROB 060727 006415 PROB 061118 006415 PROB 061120 006415 PROB 061207 006415 PROB 064713 007991 PROB 051234 007991 PROB 061749 007991 PROB 048163 007991 PROB 044949 011512 PROB 045274 011512 PROB 048107 011512 PROB 042019 011512 PROB 048401 011512 PROB 049187 011512 PROB 058716 011512 PROB 096991 011512 PROB 037073 011512 PROB 063627 011512 PROB 068318 011512 PROB 061310 011512 PROB 066207 011512 PROB 048451 011512 PROB 044187 011512 PROB 020716 011512 PROB 076981 011512 PROB 017073 011512 PROB 052627 011512 PROB 061318 011512 PROB 031380 011512 PROB 026237 011512 PROB END DATA. FREQ Officer_ID. DATASET NAME OriginalData. DATASET COPY ListofSampleData. ******************************************************. DATASET ACTIVATE ListofSampleData. SORT CASES BY Officer_ID /* if necessary */. * Set random-number generator parameters, if desired . SET RNG = MT /* 'Mersenne twister' random-no. generator */ . SET MTINDEX = 7778 /* or other starting value - anything */ . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=Officer_ID /NRecords 'Number of open cases for Officer'=NU. NUMERIC #K #N (F3). DO IF $CASENUM EQ 1 OR Officer_ID NE LAG(Officer_ID). . COMPUTE #N = NRecords /* Total open records, per Officer */. . COMPUTE #K = MIN(NRecords, 10) /* Set sample size */. END IF. COMPUTE #Take_It = RV.BERNOULLI(#K/#N). COMPUTE #K = #K - #Take_It. COMPUTE #N = #N - 1. SELECT IF #Take_It. FREQ Officer_ID. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Ariel,
I'm not saying your code can't be made to work. Just that I'd work the problem this way. (see at the bottom). Note. Code is untried. >>The background is that I have a list of open cases for probation officers and I want to randomly pick 10 of their cases for review. For the probation officers that have less than 10 cases open, I would like to select all of their cases. There are 20 probation officers and 395 open cases. The caseload varies from 1 to 35 open cases. I think I have modified the syntax in the links below correctly, however, the correct number of cases is not always returned regardless of whether the probation officer has more or less than 10 cases. For example, there is a probation officer that has 35 cases and after I run the syntax below, sometimes I have 10 cases (correct) and other times I have 8 or 9 etc.(incorrect). I'm wondering if it's possible that I either made a mistake in adjusting the code or perhaps there is some issue with it that someone could pinpoint. The syntax that I'm adjusting/using was posted to the list here: *Sample Data. DATA LIST LIST /Patient_Number (A9) Officer_ID (A7) Program (A7). BEGIN DATA 041949 006415 PROB 045284 006415 PROB 046107 006415 PROB 047019 006415 PROB 048501 006415 PROB 049087 006415 PROB 052716 006415 PROB 056991 006415 PROB 057073 006415 PROB 060727 006415 PROB 061118 006415 PROB 061120 006415 PROB 061207 006415 PROB 064713 007991 PROB 051234 007991 PROB 061749 007991 PROB 048163 007991 PROB 044949 011512 PROB 045274 011512 PROB 048107 011512 PROB 042019 011512 PROB 048401 011512 PROB 049187 011512 PROB 058716 011512 PROB 096991 011512 PROB 037073 011512 PROB 063627 011512 PROB 068318 011512 PROB 061310 011512 PROB 066207 011512 PROB 048451 011512 PROB 044187 011512 PROB 020716 011512 PROB 076981 011512 PROB 017073 011512 PROB 052627 011512 PROB 061318 011512 PROB 031380 011512 PROB 026237 011512 PROB END DATA. Compute rv=uniform(1). Sort cases by officer_id rv. Compute pick=0. Do if ($casenum eq 1 or officer_id ne lag(officer_id)). + compute pick=1. Else. + if (lag(pick) lt 10) pick=lag(pick)+1. End if. Execute. Temporary. Select if (pick gt 0). ..... Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by ariel barak
At 11:33 AM 10/31/2008, Ariel Barak wrote:
>I have a list of open cases for probation officers and I want to >randomly pick 10 of their cases for review. For the probation >officers that have less than 10 cases open, I would like to select >all of their cases. I think I have modified the syntax [from >previous postings(*)] correctly, however, the correct number of >cases is not always returned regardless of whether the probation >officer has more or less than 10 cases. Gene's certainly got a point, in recommending random-sort logic; I probably fall in love too much with "k/n". Random sorting can be less efficient, but the difference will be marginal in these days of huge memory and adaptive sorting algorithms. None the less, the code you sent seems to work, after I've replaced DATASET COPY by ADD FILES. (Sometimes, DATASET COPY interacts poorly with elaborate commands like MATCH or AGGREGATE in the new file.) Is there any chance you're hitting that problem? Below is a voluminous listing, with trace messages; following, is the code. Officer_ID |-----|------|---------|-------|-------------|---------------| | | |Frequency|Percent|Valid Percent|Cumulative | | | | | | |Percent | |-----|------|---------|-------|-------------|---------------| |Valid|006415|13 |33.3 |33.3 |33.3 | | |------|---------|-------|-------------|---------------| | |007991|4 |10.3 |10.3 |43.6 | | |------|---------|-------|-------------|---------------| | |011512|22 |56.4 |56.4 |100.0 | | |------|---------|-------|-------------|---------------| | |Total |39 |100.0 |100.0 | | |-----|------|---------|-------|-------------|---------------| DATASET NAME OriginalData. *... Replace the following: *... DATASET COPY ListofSampleData. ******************************************************. *... DATASET ACTIVATE ListofSampleData. *... by ADD FILES / FILE=OriginalData DATASET NAME ListofSampleData WINDOW=FRONT. SORT CASES BY Officer_ID /* if necessary */. * Set random-number generator parameters, if desired . SET RNG = MT /* 'Mersenne twister' random-no. generator */ . SET MTINDEX = 7778 /* or other starting value - anything */ . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=Officer_ID /NRecords 'Number of open cases for Officer'=NU. NUMERIC #K #N (F3) #Take_It (F2). . /**/ NUMERIC #CaseCount (F3). DO IF $CASENUM EQ 1 OR Officer_ID NE LAG(Officer_ID). . /**/ COMPUTE #CaseCount = 0. . /**/ PRINT / 'Officer ' Officer_ID ': ' NRecords ' records.'/**/. . COMPUTE #N = NRecords /* Total open records, per Officer */. . COMPUTE #K = MIN(NRecords, 10) /* Set sample size */. END IF. COMPUTE #Take_It = RV.BERNOULLI(#K/#N). . /**/ COMPUTE #CaseCount = #CaseCount + 1. . /**/ PRINT / /**/ /**/ #CaseCount ', patient ' Patient_Number ' ' /**/ /**/ 'N:' #N ' K:' #K ' Select:' #Take_it /**/. COMPUTE #K = #K - #Take_It. COMPUTE #N = #N - 1. SELECT IF #Take_It. FREQ Officer_ID. Officer 006415 : 13 records. 1 , patient 041949 N: 13 K: 10 Select: 1 2 , patient 045284 N: 12 K: 9 Select: 1 3 , patient 046107 N: 11 K: 8 Select: 1 4 , patient 047019 N: 10 K: 7 Select: 1 5 , patient 048501 N: 9 K: 6 Select: 0 6 , patient 049087 N: 8 K: 6 Select: 1 7 , patient 052716 N: 7 K: 5 Select: 0 8 , patient 056991 N: 6 K: 5 Select: 1 9 , patient 057073 N: 5 K: 4 Select: 0 10 , patient 060727 N: 4 K: 4 Select: 1 11 , patient 061118 N: 3 K: 3 Select: 1 12 , patient 061120 N: 2 K: 2 Select: 1 13 , patient 061207 N: 1 K: 1 Select: 1 Officer 007991 : 4 records. 1 , patient 064713 N: 4 K: 4 Select: 1 2 , patient 051234 N: 3 K: 3 Select: 1 3 , patient 061749 N: 2 K: 2 Select: 1 4 , patient 048163 N: 1 K: 1 Select: 1 Officer 011512 : 22 records. 1 , patient 044949 N: 22 K: 10 Select: 0 Officer 011512 : 22 records. 1 , patient 045274 N: 22 K: 10 Select: 1 2 , patient 048107 N: 21 K: 9 Select: 0 3 , patient 042019 N: 20 K: 9 Select: 1 4 , patient 048401 N: 19 K: 8 Select: 1 5 , patient 049187 N: 18 K: 7 Select: 0 6 , patient 058716 N: 17 K: 7 Select: 0 7 , patient 096991 N: 16 K: 7 Select: 0 8 , patient 037073 N: 15 K: 7 Select: 0 9 , patient 063627 N: 14 K: 7 Select: 1 10 , patient 068318 N: 13 K: 6 Select: 1 11 , patient 061310 N: 12 K: 5 Select: 1 12 , patient 066207 N: 11 K: 4 Select: 1 13 , patient 048451 N: 10 K: 3 Select: 1 14 , patient 044187 N: 9 K: 2 Select: 1 15 , patient 020716 N: 8 K: 1 Select: 0 16 , patient 076981 N: 7 K: 1 Select: 0 17 , patient 017073 N: 6 K: 1 Select: 0 18 , patient 052627 N: 5 K: 1 Select: 0 19 , patient 061318 N: 4 K: 1 Select: 0 20 , patient 031380 N: 3 K: 1 Select: 0 21 , patient 026237 N: 2 K: 1 Select: 0 Frequencies |-----------------------------|---------------------------| |Output Created |03-NOV-2008 11:44:06 | |-----------------------------|---------------------------| [OriginalData] Statistics [suppressed] Officer_ID |-----|------|---------|-------|-------------|---------------| | | |Frequency|Percent|Valid Percent|Cumulative | | | | | | |Percent | |-----|------|---------|-------|-------------|---------------| |Valid|006415|10 |43.5 |43.5 |43.5 | | |------|---------|-------|-------------|---------------| | |007991|4 |17.4 |17.4 |60.9 | | |------|---------|-------|-------------|---------------| | |011512|9 |39.1 |39.1 |100.0 | | |------|---------|-------|-------------|---------------| | |Total |23 |100.0 |100.0 | | |-----|------|---------|-------|-------------|---------------| LIST. List |-----------------------------|---------------------------| |Output Created |03-NOV-2008 11:44:07 | |-----------------------------|---------------------------| [OriginalData] Patient_Number Officer_ID Program NRecords 041949 006415 PROB 13 045284 006415 PROB 13 046107 006415 PROB 13 047019 006415 PROB 13 049087 006415 PROB 13 056991 006415 PROB 13 060727 006415 PROB 13 061118 006415 PROB 13 061120 006415 PROB 13 061207 006415 PROB 13 064713 007991 PROB 4 051234 007991 PROB 4 061749 007991 PROB 4 048163 007991 PROB 4 045274 011512 PROB 22 042019 011512 PROB 22 048401 011512 PROB 22 063627 011512 PROB 22 068318 011512 PROB 22 061310 011512 PROB 22 066207 011512 PROB 22 048451 011512 PROB 22 044187 011512 PROB 22 Number of cases read: 23 Number of cases listed: 23 ============================= APPENDIX: Test data, and code ============================= * C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2008d . * \2008-10-31 Barak - Randomly Select a Specific Number of Cases by Group.SPS. * In response to posting . * Date: Fri, 31 Oct 2008 10:33:46 -0500 . * From: Ariel Barak <[hidden email]> . * Subject: Randomly Select a Specific Number of Cases by Group . * To: [hidden email] . * ................................................................. . * "I think I have modified the syntax correctly, however, the . * correct number of cases is not always returned regardless of . * whether the probation officer has more or less than 10 cases. . * For example, there is a probation officer that has 35 cases and . * after I run the syntax below, sometimes I have 10 cases . * (correct) and other times I have 8 or 9 etc.(incorrect)." * The syntax he's modifying is from my postings . * Date: Tue, 11 Mar 2008 12:06:14 -0400 . * From: Richard Ristow <[hidden email]> . * Subject: Re: Random Cuts . * with correction . * Date: Tue, 11 Mar 2008 14:10:03 -0400 . * From: Richard Ristow <[hidden email]> . * Subject: Re: Random Cuts . * ................................................................. . * ................................................................. . * ............... Data and code, as posted .................... . *Sample Data. DATA LIST LIST /Patient_Number (A9) Officer_ID (A7) Program (A7). BEGIN DATA 041949 006415 PROB 045284 006415 PROB 046107 006415 PROB 047019 006415 PROB 048501 006415 PROB 049087 006415 PROB 052716 006415 PROB 056991 006415 PROB 057073 006415 PROB 060727 006415 PROB 061118 006415 PROB 061120 006415 PROB 061207 006415 PROB 064713 007991 PROB 051234 007991 PROB 061749 007991 PROB 048163 007991 PROB 044949 011512 PROB 045274 011512 PROB 048107 011512 PROB 042019 011512 PROB 048401 011512 PROB 049187 011512 PROB 058716 011512 PROB 096991 011512 PROB 037073 011512 PROB 063627 011512 PROB 068318 011512 PROB 061310 011512 PROB 066207 011512 PROB 048451 011512 PROB 044187 011512 PROB 020716 011512 PROB 076981 011512 PROB 017073 011512 PROB 052627 011512 PROB 061318 011512 PROB 031380 011512 PROB 026237 011512 PROB END DATA. FREQ Officer_ID. DATASET NAME OriginalData. *... Replace the following: *... DATASET COPY ListofSampleData. ******************************************************. *... DATASET ACTIVATE ListofSampleData. *... by ADD FILES / FILE=OriginalData DATASET NAME ListofSampleData WINDOW=FRONT. SORT CASES BY Officer_ID /* if necessary */. * Set random-number generator parameters, if desired . SET RNG = MT /* 'Mersenne twister' random-no. generator */ . SET MTINDEX = 7778 /* or other starting value - anything */ . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=Officer_ID /NRecords 'Number of open cases for Officer'=NU. NUMERIC #K #N (F3) #Take_It (F2). . /**/ NUMERIC #CaseCount (F3). DO IF $CASENUM EQ 1 OR Officer_ID NE LAG(Officer_ID). . /**/ COMPUTE #CaseCount = 0. . /**/ PRINT / 'Officer ' Officer_ID ': ' NRecords ' records.'/**/. . COMPUTE #N = NRecords /* Total open records, per Officer */. . COMPUTE #K = MIN(NRecords, 10) /* Set sample size */. END IF. COMPUTE #Take_It = RV.BERNOULLI(#K/#N). . /**/ COMPUTE #CaseCount = #CaseCount + 1. . /**/ PRINT / /**/ /**/ #CaseCount ', patient ' Patient_Number ' ' /**/ /**/ 'N:' #N ' K:' #K ' Select:' #Take_it /**/. COMPUTE #K = #K - #Take_It. COMPUTE #N = #N - 1. SELECT IF #Take_It. FREQ Officer_ID. LIST. ============================ (*) Date: Tue, 11 Mar 2008 12:06:14 -0400 From: Richard Ristow <[hidden email]> Subject: Re: Random Cuts with correction posted From: Richard Ristow <[hidden email]> Subject: Re: Random Cuts ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
