SPSSX Discussion

Nested sampling

Classic

List

Threaded

3 messages Options

John Norton

Nested sampling

Hi List,

Is it possible to sample for a defined n of cases from within an ID? For example, an ID value can occur several times within the data set. In some instances, I have 40 cases with the same ID (while in other cases I may only have 1 or two cases). Is it possible to take a random sample of no more than 5 cases from within each ID value?

This gets trickier as there are other criteria with which I need to limit the sample, but the idea is the same. Essentially it's a nested random sampling. Any advise?

TIA,

John

John Norton
Biostatistician
Oncology Institute
Loyola University Medical Center

(708) 327-3095
[hidden email]

"Absence of evidence
is not evidence of absence"

Peck, Jon

Re: Nested sampling

The Complex Samples option supports a lot of sampling schemes. It would not only draw the sample for you but give you methods for taking the sampling design into account in the analysis stage.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John Norton
Sent: Monday, January 29, 2007 2:51 PM
To: [hidden email]
Subject: [SPSSX-L] Nested sampling

Hi List,

Is it possible to sample for a defined n of cases from within an ID? For example, an ID value can occur several times within the data set. In some instances, I have 40 cases with the same ID (while in other cases I may only have 1 or two cases). Is it possible to take a random sample of no more than 5 cases from within each ID value?

This gets trickier as there are other criteria with which I need to limit the sample, but the idea is the same. Essentially it's a nested random sampling. Any advise?

TIA,

John

John Norton
Biostatistician
Oncology Institute
Loyola University Medical Center

(708) 327-3095
[hidden email]

"Absence of evidence
is not evidence of absence"

Richard Ristow

Re: Nested sampling

In reply to this post by John Norton

At 03:50 PM 1/29/2007, John Norton wrote:

>Is it possible to sample for a defined n of cases from within an
>ID? For example, an ID value can occur several times within the data
>set. In some instances, I have 40 cases with the same ID (while in
>other cases I may only have 1 or two cases). Is it possible to take a
>random sample of no more than 5 cases from within each ID value?

Jon Peck's noted the COMPLEX SAMPLES module. In the meantime, you can
use the 'k/n' method to do your sampling, though that doesn't help with
the analysis. (If it's of real interest, I'll discuss the 'k/n'
algorithm another time.) This method does leave the size of each entire
group in the output file ('GrpSize'), which may be useful. You can
replace 'ID' by any variable or list of variables. '#k' and '#n' must
either be scratch variables (as here, and recommended), or must have
LEAVE specified for them. Code not tested:

SORT CASES BY ID.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/GrpSize 'Number of cases in ID group' = N.
FORMATS GrpSize (F4).
ADD FILES
/FILE=*
/BY ID
/FIRST=NewGrp.
COMPUTE #TakeIt = 0.
DO IF NewGrp EQ 1.
. COMPUTE #k = 5.
. COMPUTE #n = GrpSize.
END IF.
DO IF GrpSize LE 5.
. COMPUTE #TakeIt = 1.
ELSE.
* Select a random integer, equi-distributed .
* on 1 through #n .
. COMPUTE #Draw = TRUNC(RV.UNIFORM(1,#n+1)).
. DO IF #Draw LE #k.
. COMPUTE #TakeIt = 1.
. COMPUTE #k = #k-1.
. END IF.
. COMPUTE #n = #n-1.
END IF.
SELECT IF #TakeIt EQ 1.