|
Hello Everybody,
I have a file that has 50,000 customers IDs, who belongs to 30 different cities and buy products from 50 different retailers. We want to draw a sample of only 5,000 customers (10%) for a survey but this sample should reflect the similar proportion of cities and retailers as of population. Suppose if city A has 5% market share in the population (50,000) then the random sample (5,000) should also have 5% customers from city A. Also within city A the customers of all retailers should be included in similar proportion as of population. It is possible that some retailers might not operate in some cities also. I learnt that "Complex Sample" could be used. I appreciate it if somebody could tell me how this simple sample could be drawn by using "complex samples" module. ID Cities Retailers 1 1 1 2 1 1 3 1 2 4 1 2 5 1 3 6 1 3 7 1 3 8 2 1 9 2 1 10 2 2 399998 29 1 399999 30 49 400000 30 50 Any pointer would be highly appreciated. I have reviewed the tutorial but could not any useful example. It appears that "cities" variable could be used as "strata" and "retailers" as "clusters" but how to specify 10% (only 5,000) to be selected from total population (50,000) and keeping the population proportions for cities and retailers (within cities). Look forward to hear from some experts soon. Best regards, Sasa ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Sanjay,
>>I have a file that has 50,000 customers IDs, who belongs to 30 different cities and buy products from 50 different retailers. We want to draw a sample of only 5,000 customers (10%) for a survey but this sample should reflect the similar proportion of cities and retailers as of population. Suppose if city A has 5% market share in the population (50,000) then the random sample (5,000) should also have 5% customers from city A. Also within city A the customers of all retailers should be included in similar proportion as of population. It is possible that some retailers might not operate in some cities also. I'll assume that cities are numbered 1-30 and retailers are numbered 1-50. Compute citystore=city*100+retailer. Sort cases by citystore. Aggregate outfile=* mode=addvariables/break=citystore/count=nu. Compute ranvar=uniform(1). Sort cases by citystore ranvar. Compute seq=1. If (citystore eq lag(citystore)) seq=lag(seq)+1. If (seq gt .10*count) seq=0. Select if (seq ne 0). Execute. * this should give you a sample of nearly exactly 5,000 with nearly exactly a 10% sample of every city and store combination. See if this is what you need. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
