|
I have a question I am hoping to get some help with:
I have a data set with N = 447 clients. 4 clusters of clients have been identified. I want to select 15 cases from each cluster randomly selected to be representative of the age & sex distribution of each cluster. (The age range is 4-11 years. To simplify we are using age groups 4-7 and 8-11). For example, for cluster 1 (N = 220), if there are 22 females aged 4-7 (10%), then I'd like the sample of 15 to reflect this proportion (2 clients, rounded up from 1.5). If there are 110 males aged 8-11 (50%), I'd like the sample of 15 to reflect this proportion (7 or 8 clients). I am looking for some type of syntax to randomly identify these clients, with the proportions in the population reflected in the sample. Thanks ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Juliana,
I'm going to assume that your have a cluster id variable, clusid, and age range variable, agecat with values of 0=4-7 and 1=8=11, and sex is coded 0 and 1. You have no missing data. Conceptually, 1) aggregate the file breaking on clusid and adding a cluster count variable, cluscount, to each record. 1) aggregate the file breaking on clusid, agecat, and sex and adding a layer count variable, layercount, to each record. 2) create a cases per sample variable, maxcase=rnd(15*layercount/cluscount), to compute the permitted number of cases to be selected per clusid-agecat-sex combination. 3) create a random number variable, rannum=unform(1). 4) sort cases by clusid, agecat, sex, rannum. 5) compute a case index variable, caseindex, as follows to index the number of cases within each combination of clusid, agecat, sex. Compute caseindex=1. If (clusid eq lag(clusid) and agecat eq lag(agecat) and sex eq lag(sex)) caseindex=lag(caseindex)+1. 6) do a frequencies of caseindex at this point to verify that you have no missing values. You could also crosstab caseindex by casecount as another check. Then, when you are satisfied that caseindex is working correctly, do the following syntax to 'knockout' cases with caseindex values greater than casecount. If (caseindex gt casecount) caseindex=-9. 7) select cases such that caseindex gt 0. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
