I have a sampling problem that's probably easier to solve than I'm
making it out to be, but I'm relatively new to SPSS. I'm sure there's something silly that I'm missing. Using SPSS 14, I need to sample a certain number of cases within multiple categories of a variable. I have a data file of apartment complexes in a particular city (each case = 1 complex). One variable in this file is neighborhood. I need to randomly select exactly 30 cases from within each neighborhood. That is, for each value of the neighborhood variable, I need a random sample of 30 cases. I could use something like this: select if (neighborhood=1). sample 30 from 164. where 164 is the N of cases within neighborhood 1, and simply create a new data file for each neighborhood. But there are 50+ neighborhoods and I'll need all of the sampled cases in one file in the end. Sure, I could merge all of the files back together, but that seems awfully laborious. What's the smart way to do this? Thanks, Troy Payne |
Hi Troy,
I use this trick in similar cases (an example on a standard SPSS file delivered with the instalation CD): GET FILE='C:\Program Files\SPSS14\GSS93 subset.sav'. * Let us select 30 cases from each astrological sign (variable zodiac). fre zodiac. * Sort the file and count cases in each group (=> variable N_astrolog). sort cases by zodiac. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=zodiac /N_astrolog=N. * Create the standard variable filter_$ containing the selection . * Remark: the following syntax is a clone of the standard SPSS syntax for "n of the first m". do if $casenum = 1 or zodiac ne lag(zodiac). compute #s_$_1=30. compute #s_$_2=N_astrolog. end if. do if #s_$_2 > 0. compute filter_$ = uniform(1)* #s_$_2 < #s_$_1. compute #s_$_1 = #s_$_1 - filter_$. compute #s_$_2 = #s_$_2 - 1. else. compute filter_$ = 0. end if. VARIABLE LABEL filter_$ '30 from each zodiac group (SAMPLE)'. FORMAT filter_$ (f1.0). if missing(zodiac) filter_$ = 0. FILTER BY filter_$. fre zodiac. *** END ***. The sample is encoded in the variable filter_$ (1=selected, 0=not selected). If you wish to delete the rest of cases, you can run this: select if filter_$. execute. Greetigns Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Troy Payne Sent: Friday, January 26, 2007 8:23 PM To: [hidden email] Subject: Sampling question I have a sampling problem that's probably easier to solve than I'm making it out to be, but I'm relatively new to SPSS. I'm sure there's something silly that I'm missing. Using SPSS 14, I need to sample a certain number of cases within multiple categories of a variable. I have a data file of apartment complexes in a particular city (each case = 1 complex). One variable in this file is neighborhood. I need to randomly select exactly 30 cases from within each neighborhood. That is, for each value of the neighborhood variable, I need a random sample of 30 cases. I could use something like this: select if (neighborhood=1). sample 30 from 164. where 164 is the N of cases within neighborhood 1, and simply create a new data file for each neighborhood. But there are 50+ neighborhoods and I'll need all of the sampled cases in one file in the end. Sure, I could merge all of the files back together, but that seems awfully laborious. What's the smart way to do this? Thanks, Troy Payne |
In reply to this post by Troy Payne-2
Shalom
There is more then one way of doing the job. here is a vary effacement way to do it . title sampling sub groups . input program . loop #j =1 to 3 . loop #i =1 to 40 . compute groups =#j. compute recno =#i. end case . end loop . end loop . end file . end input program . execute . * >> information on how much to sample from each group << . recode groups(1=15)(2=23)(3=8) into gsample . * the sub groups should be more the 10 cases . compute unif=uniform(100) . sort cases by groups unif . add files file=* / by groups / first=startgroup . numeric groupseq(f4) . leave groupseq. if startgroup eq 1 groupseq=0. compute groupseq=sum(groupseq,1) . if groupseq le gsample insample= 1. execute . Hillel Vardi Troy Payne wrote: > I have a sampling problem that's probably easier to solve than I'm > making it out to be, but I'm relatively new to SPSS. I'm sure there's > something silly that I'm missing. > > Using SPSS 14, I need to sample a certain number of cases within > multiple categories of a variable. I have a data file of apartment > complexes in a particular city (each case = 1 complex). One variable > in this file is neighborhood. I need to randomly select exactly 30 > cases from within each neighborhood. That is, for each value of the > neighborhood variable, I need a random sample of 30 cases. > > I could use something like this: > select if (neighborhood=1). > sample 30 from 164. > > where 164 is the N of cases within neighborhood 1, and simply create a > new data file for each neighborhood. But there are 50+ neighborhoods > and I'll need all of the sampled cases in one file in the end. Sure, I > could merge all of the files back together, but that seems awfully > laborious. > > What's the smart way to do this? > > Thanks, > > Troy Payne > > |
Free forum by Nabble | Edit this page |