random sampling question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

random sampling question

Juliana-7
I have a question I am hoping to get some help with:

I have a data set with N = 447 clients. 4 clusters of clients have been
identified. I want to select 15 cases from each cluster randomly selected to
be representative of the age & sex distribution of each cluster.
(The age range is 4-11 years. To simplify we are using age groups 4-7
and 8-11).

For example, for cluster 1 (N = 220), if there are 22 females aged 4-7
(10%),  then I'd like the sample of 15 to reflect this proportion (2
clients, rounded up from 1.5). If there are 110 males aged 8-11 (50%), I'd
like the sample of 15 to reflect this proportion (7 or 8 clients).

I am looking for some type of syntax to randomly identify these clients,
with the proportions in the population reflected in the sample.

Thanks

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: random sampling question

Maguin, Eugene
Juliana,

I'm going to assume that your have a cluster id variable, clusid, and age
range variable, agecat with values of 0=4-7 and 1=8=11, and sex is coded 0
and 1. You have no missing data.

Conceptually,
1) aggregate the file breaking on clusid and adding a cluster count
variable, cluscount, to each record.

1) aggregate the file breaking on clusid, agecat, and sex and adding a layer
count variable, layercount, to each record.

2) create a cases per sample variable, maxcase=rnd(15*layercount/cluscount),
to compute the permitted number of cases to be selected per
clusid-agecat-sex combination.

3) create a random number variable, rannum=unform(1).

4) sort cases by clusid, agecat, sex, rannum.

5) compute a case index variable, caseindex, as follows to index the number
of cases within each combination of clusid, agecat, sex.

Compute caseindex=1.
If (clusid eq lag(clusid) and agecat eq lag(agecat) and
   sex eq lag(sex)) caseindex=lag(caseindex)+1.

6) do a frequencies of caseindex at this point to verify that you have no
missing values. You could also crosstab caseindex by casecount as another
check. Then, when you are satisfied that caseindex is working correctly, do
the following syntax to 'knockout' cases with caseindex values greater than
casecount.

If (caseindex gt casecount) caseindex=-9.

7) select cases such that caseindex gt 0.


Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD