Nested sampling

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Nested sampling

John Norton
Hi List,
 
Is it possible to sample for a defined n of cases from within an ID?  For example, an ID value can occur several times within the data set.  In some instances, I have 40 cases with the same ID (while in other cases I may only have 1 or two cases).  Is it possible to take a random sample of no more than 5 cases from within each ID value?
 
This gets trickier as there are other criteria with which I need to limit the sample, but the idea is the same.  Essentially it's a nested random sampling.  Any advise?
 
TIA,
 
John
 
John Norton
Biostatistician
Oncology Institute
Loyola University Medical Center
 
(708) 327-3095
[hidden email]
 
"Absence of evidence
      is not evidence of absence"
Reply | Threaded
Open this post in threaded view
|

Re: Nested sampling

Peck, Jon
The Complex Samples option supports a lot of sampling schemes.  It would not only draw the sample for you but give you methods for taking the sampling design into account in the analysis stage.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John Norton
Sent: Monday, January 29, 2007 2:51 PM
To: [hidden email]
Subject: [SPSSX-L] Nested sampling

Hi List,

Is it possible to sample for a defined n of cases from within an ID?  For example, an ID value can occur several times within the data set.  In some instances, I have 40 cases with the same ID (while in other cases I may only have 1 or two cases).  Is it possible to take a random sample of no more than 5 cases from within each ID value?

This gets trickier as there are other criteria with which I need to limit the sample, but the idea is the same.  Essentially it's a nested random sampling.  Any advise?

TIA,

John

John Norton
Biostatistician
Oncology Institute
Loyola University Medical Center

(708) 327-3095
[hidden email]

"Absence of evidence
      is not evidence of absence"
Reply | Threaded
Open this post in threaded view
|

Re: Nested sampling

Richard Ristow
In reply to this post by John Norton
At 03:50 PM 1/29/2007, John Norton wrote:

>Is it possible to sample for a defined n of cases from within an
>ID?  For example, an ID value can occur several times within the data
>set.  In some instances, I have 40 cases with the same ID (while in
>other cases I may only have 1 or two cases).  Is it possible to take a
>random sample of no more than 5 cases from within each ID value?

Jon Peck's noted the COMPLEX SAMPLES module. In the meantime, you can
use the 'k/n' method to do your sampling, though that doesn't help with
the analysis. (If it's of real interest, I'll discuss the 'k/n'
algorithm another time.) This method does leave the size of each entire
group in the output file ('GrpSize'), which may be useful. You can
replace 'ID' by any variable or list of variables. '#k' and '#n' must
either be scratch variables (as here, and recommended), or must have
LEAVE specified for them. Code not tested:

SORT CASES BY ID.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
   /BREAK=ID
   /GrpSize 'Number of cases in ID group' = N.
FORMATS GrpSize (F4).
ADD FILES
   /FILE=*
   /BY ID
   /FIRST=NewGrp.
COMPUTE #TakeIt = 0.
DO IF   NewGrp EQ 1.
.  COMPUTE #k   = 5.
.  COMPUTE #n   = GrpSize.
END IF.
DO IF   GrpSize LE 5.
.  COMPUTE #TakeIt = 1.
ELSE.
*  Select a random integer, equi-distributed .
*  on 1 through #n                           .
.  COMPUTE #Draw = TRUNC(RV.UNIFORM(1,#n+1)).
.  DO IF   #Draw LE #k.
.     COMPUTE #TakeIt = 1.
.     COMPUTE #k      = #k-1.
.  END IF.
.  COMPUTE    #n      = #n-1.
END IF.
SELECT IF #TakeIt EQ 1.