Hi List,
Is it possible to sample for a defined n of cases from within an ID? For example, an ID value can occur several times within the data set. In some instances, I have 40 cases with the same ID (while in other cases I may only have 1 or two cases). Is it possible to take a random sample of no more than 5 cases from within each ID value? This gets trickier as there are other criteria with which I need to limit the sample, but the idea is the same. Essentially it's a nested random sampling. Any advise? TIA, John John Norton Biostatistician Oncology Institute Loyola University Medical Center (708) 327-3095 [hidden email] "Absence of evidence is not evidence of absence" |
The Complex Samples option supports a lot of sampling schemes. It would not only draw the sample for you but give you methods for taking the sampling design into account in the analysis stage.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John Norton Sent: Monday, January 29, 2007 2:51 PM To: [hidden email] Subject: [SPSSX-L] Nested sampling Hi List, Is it possible to sample for a defined n of cases from within an ID? For example, an ID value can occur several times within the data set. In some instances, I have 40 cases with the same ID (while in other cases I may only have 1 or two cases). Is it possible to take a random sample of no more than 5 cases from within each ID value? This gets trickier as there are other criteria with which I need to limit the sample, but the idea is the same. Essentially it's a nested random sampling. Any advise? TIA, John John Norton Biostatistician Oncology Institute Loyola University Medical Center (708) 327-3095 [hidden email] "Absence of evidence is not evidence of absence" |
In reply to this post by John Norton
At 03:50 PM 1/29/2007, John Norton wrote:
>Is it possible to sample for a defined n of cases from within an >ID? For example, an ID value can occur several times within the data >set. In some instances, I have 40 cases with the same ID (while in >other cases I may only have 1 or two cases). Is it possible to take a >random sample of no more than 5 cases from within each ID value? Jon Peck's noted the COMPLEX SAMPLES module. In the meantime, you can use the 'k/n' method to do your sampling, though that doesn't help with the analysis. (If it's of real interest, I'll discuss the 'k/n' algorithm another time.) This method does leave the size of each entire group in the output file ('GrpSize'), which may be useful. You can replace 'ID' by any variable or list of variables. '#k' and '#n' must either be scratch variables (as here, and recommended), or must have LEAVE specified for them. Code not tested: SORT CASES BY ID. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /GrpSize 'Number of cases in ID group' = N. FORMATS GrpSize (F4). ADD FILES /FILE=* /BY ID /FIRST=NewGrp. COMPUTE #TakeIt = 0. DO IF NewGrp EQ 1. . COMPUTE #k = 5. . COMPUTE #n = GrpSize. END IF. DO IF GrpSize LE 5. . COMPUTE #TakeIt = 1. ELSE. * Select a random integer, equi-distributed . * on 1 through #n . . COMPUTE #Draw = TRUNC(RV.UNIFORM(1,#n+1)). . DO IF #Draw LE #k. . COMPUTE #TakeIt = 1. . COMPUTE #k = #k-1. . END IF. . COMPUTE #n = #n-1. END IF. SELECT IF #TakeIt EQ 1. |
Free forum by Nabble | Edit this page |