SPSSX Discussion

Sampling question

Classic

List

Threaded

3 messages Options

Troy Payne-2

Sampling question

I have a sampling problem that's probably easier to solve than I'm
making it out to be, but I'm relatively new to SPSS. I'm sure there's
something silly that I'm missing.

Using SPSS 14, I need to sample a certain number of cases within
multiple categories of a variable. I have a data file of apartment
complexes in a particular city (each case = 1 complex). One variable
in this file is neighborhood. I need to randomly select exactly 30
cases from within each neighborhood. That is, for each value of the
neighborhood variable, I need a random sample of 30 cases.

I could use something like this:
select if (neighborhood=1).
sample 30 from 164.

where 164 is the N of cases within neighborhood 1, and simply create a
new data file for each neighborhood. But there are 50+ neighborhoods
and I'll need all of the sampled cases in one file in the end. Sure, I
could merge all of the files back together, but that seems awfully
laborious.

What's the smart way to do this?

Thanks,

Troy Payne

Spousta Jan

Re: Sampling question

Hi Troy,

I use this trick in similar cases (an example on a standard SPSS file
delivered with the instalation CD):

GET FILE='C:\Program Files\SPSS14\GSS93 subset.sav'.
* Let us select 30 cases from each astrological sign (variable zodiac).

fre zodiac.

* Sort the file and count cases in each group (=> variable N_astrolog).
sort cases by zodiac.
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=zodiac /N_astrolog=N.

* Create the standard variable filter_$ containing the selection .
* Remark: the following syntax is a clone of the standard SPSS syntax
for "n of the first m".

do if $casenum = 1 or zodiac ne lag(zodiac).
compute #s_$_1=30.
compute #s_$_2=N_astrolog.
end if.
do if #s_$_2 > 0.
compute filter_$ = uniform(1)* #s_$_2 < #s_$_1.
compute #s_$_1 = #s_$_1 - filter_$.
compute #s_$_2 = #s_$_2 - 1.
else.
compute filter_$ = 0.
end if.
VARIABLE LABEL filter_$ '30 from each zodiac group (SAMPLE)'.
FORMAT filter_$ (f1.0).

if missing(zodiac) filter_$ = 0.

FILTER BY filter_$.

fre zodiac.

*** END ***.

The sample is encoded in the variable filter_$ (1=selected, 0=not
selected). If you wish to delete the rest of cases, you can run this:

select if filter_$.
execute.

Greetigns

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Troy Payne
Sent: Friday, January 26, 2007 8:23 PM
To: [hidden email]
Subject: Sampling question

I have a sampling problem that's probably easier to solve than I'm
making it out to be, but I'm relatively new to SPSS. I'm sure there's
something silly that I'm missing.

Using SPSS 14, I need to sample a certain number of cases within
multiple categories of a variable. I have a data file of apartment
complexes in a particular city (each case = 1 complex). One variable in
this file is neighborhood. I need to randomly select exactly 30 cases
from within each neighborhood. That is, for each value of the
neighborhood variable, I need a random sample of 30 cases.

I could use something like this:
select if (neighborhood=1).
sample 30 from 164.

where 164 is the N of cases within neighborhood 1, and simply create a
new data file for each neighborhood. But there are 50+ neighborhoods
and I'll need all of the sampled cases in one file in the end. Sure, I
could merge all of the files back together, but that seems awfully
laborious.

What's the smart way to do this?

Thanks,

Troy Payne

hillel vardi

Re: Sampling question

In reply to this post by Troy Payne-2

Shalom

There is more then one way of doing the job. here is a vary effacement
way to do it .

title sampling sub groups .
input program .
loop #j =1 to 3 .
loop #i =1 to 40 .
compute groups =#j.
compute recno =#i.
end case .
end loop .
end loop .
end file .
end input program .
execute .

* >> information on how much to sample from each group << .
recode groups(1=15)(2=23)(3=8) into gsample .
* the sub groups should be more the 10 cases .
compute unif=uniform(100) .
sort cases by groups unif .
add files file=* / by groups / first=startgroup .
numeric groupseq(f4) .
leave groupseq.
if startgroup eq 1 groupseq=0.
compute groupseq=sum(groupseq,1) .
if groupseq le gsample insample= 1.
execute .

Hillel Vardi

Troy Payne wrote:

> I have a sampling problem that's probably easier to solve than I'm
> making it out to be, but I'm relatively new to SPSS. I'm sure there's
> something silly that I'm missing.
>
> Using SPSS 14, I need to sample a certain number of cases within
> multiple categories of a variable. I have a data file of apartment
> complexes in a particular city (each case = 1 complex). One variable
> in this file is neighborhood. I need to randomly select exactly 30
> cases from within each neighborhood. That is, for each value of the
> neighborhood variable, I need a random sample of 30 cases.
>
> I could use something like this:
> select if (neighborhood=1).
> sample 30 from 164.
>
> where 164 is the N of cases within neighborhood 1, and simply create a
> new data file for each neighborhood. But there are 50+ neighborhoods
> and I'll need all of the sampled cases in one file in the end. Sure, I
> could merge all of the files back together, but that seems awfully
> laborious.
>
> What's the smart way to do this?
>
> Thanks,
>
> Troy Payne
>
>