Sampling question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Sampling question

Troy Payne-2
I have a sampling problem that's probably easier to solve than I'm
making it out to be, but I'm relatively new to SPSS.  I'm sure there's
something silly that I'm missing.

Using SPSS 14, I need to sample a certain number of cases within
multiple categories of a variable.  I have a data file of apartment
complexes in a particular city (each case = 1 complex).  One variable
in this file is neighborhood.  I need to randomly select exactly 30
cases from within each neighborhood.  That is, for each value of the
neighborhood variable, I need a random sample of 30 cases.

I could use something like this:
    select if (neighborhood=1).
    sample 30 from 164.

where 164 is the N of cases within neighborhood 1, and simply create a
new data file for each neighborhood.  But there are 50+ neighborhoods
and I'll need all of the sampled cases in one file in the end.  Sure, I
could merge all of the files back together, but that seems awfully
laborious.

What's the smart way to do this?

Thanks,

Troy Payne
Reply | Threaded
Open this post in threaded view
|

Re: Sampling question

Spousta Jan
Hi Troy,

I use this trick in similar cases (an example on a standard SPSS file
delivered with the instalation CD):

GET FILE='C:\Program Files\SPSS14\GSS93 subset.sav'.
* Let us select 30 cases from each astrological sign (variable zodiac).

fre zodiac.

* Sort the file and count cases in each group (=> variable N_astrolog).
sort cases by zodiac.
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=zodiac /N_astrolog=N.

* Create the standard variable filter_$ containing the selection .
* Remark: the following syntax is a clone of the standard SPSS syntax
for "n of the first m".

do if $casenum = 1 or zodiac ne lag(zodiac).
  compute #s_$_1=30.
  compute #s_$_2=N_astrolog.
end if.
do if #s_$_2 > 0.
  compute filter_$ = uniform(1)* #s_$_2 < #s_$_1.
  compute #s_$_1 = #s_$_1 - filter_$.
  compute #s_$_2 = #s_$_2 - 1.
else.
  compute filter_$ = 0.
end if.
VARIABLE LABEL filter_$ '30 from each zodiac group (SAMPLE)'.
FORMAT filter_$ (f1.0).

if missing(zodiac) filter_$ = 0.

FILTER BY filter_$.

fre zodiac.

*** END ***.


The sample is encoded in the variable filter_$ (1=selected, 0=not
selected). If you wish to delete the rest of cases, you can run this:

select if filter_$.
execute.

Greetigns

Jan


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Troy Payne
Sent: Friday, January 26, 2007 8:23 PM
To: [hidden email]
Subject: Sampling question

I have a sampling problem that's probably easier to solve than I'm
making it out to be, but I'm relatively new to SPSS.  I'm sure there's
something silly that I'm missing.

Using SPSS 14, I need to sample a certain number of cases within
multiple categories of a variable.  I have a data file of apartment
complexes in a particular city (each case = 1 complex).  One variable in
this file is neighborhood.  I need to randomly select exactly 30 cases
from within each neighborhood.  That is, for each value of the
neighborhood variable, I need a random sample of 30 cases.

I could use something like this:
    select if (neighborhood=1).
    sample 30 from 164.

where 164 is the N of cases within neighborhood 1, and simply create a
new data file for each neighborhood.  But there are 50+ neighborhoods
and I'll need all of the sampled cases in one file in the end.  Sure, I
could merge all of the files back together, but that seems awfully
laborious.

What's the smart way to do this?

Thanks,

Troy Payne
Reply | Threaded
Open this post in threaded view
|

Re: Sampling question

hillel vardi
In reply to this post by Troy Payne-2
Shalom


There is more then one way of doing the job.  here is a vary effacement
way to do it .



title    sampling sub groups .
input program .
loop      #j =1 to 3 .
loop      #i =1 to 40 .
compute   groups =#j.
compute   recno =#i.
end case .
end loop .
end loop .
end file .
end input program .
execute .

* >>  information on how much to sample from each group << .
recode   groups(1=15)(2=23)(3=8) into gsample .
*   the sub groups should  be  more the 10 cases .
compute  unif=uniform(100) .
sort cases by groups unif .
add files  file=* / by groups / first=startgroup .
numeric   groupseq(f4) .
leave     groupseq.
if        startgroup eq 1    groupseq=0.
compute   groupseq=sum(groupseq,1) .
if        groupseq le gsample  insample= 1.
execute .


Hillel Vardi




Troy Payne wrote:

> I have a sampling problem that's probably easier to solve than I'm
> making it out to be, but I'm relatively new to SPSS.  I'm sure there's
> something silly that I'm missing.
>
> Using SPSS 14, I need to sample a certain number of cases within
> multiple categories of a variable.  I have a data file of apartment
> complexes in a particular city (each case = 1 complex).  One variable
> in this file is neighborhood.  I need to randomly select exactly 30
> cases from within each neighborhood.  That is, for each value of the
> neighborhood variable, I need a random sample of 30 cases.
>
> I could use something like this:
>     select if (neighborhood=1).
>     sample 30 from 164.
>
> where 164 is the N of cases within neighborhood 1, and simply create a
> new data file for each neighborhood.  But there are 50+ neighborhoods
> and I'll need all of the sampled cases in one file in the end.  Sure, I
> could merge all of the files back together, but that seems awfully
> laborious.
>
> What's the smart way to do this?
>
> Thanks,
>
> Troy Payne
>
>