SPSSX Discussion

FW: random sample of cases by groups

Classic

List

Threaded

6 messages Options

mils

FW: random sample of cases by groups

Hi Everyone,

I was wondering if someone could help me . I’m trying to select a random sample base on three different variables (at the same time). So far I’ve got a syntax that does it based on one variable (see syntax below). The variables and the proportion I would need to use are described next:

		total
		1
		Count	Column N %
Gender	Female	1000	100.0%
Gender	Total	1000	100.0%
Age	18-24	400	40.0%
	25-34	290	29.0%
	35-44	180	18.0%
	45-54	100	10.0%
	55-64	20	2.0%
	65+	10	1.0%
	Total	1000	100.0%
Region	East Anglia	-
	East Midlands & West Midlands & East Anglia	100	10.0%
	Northern & Yorkshire/Humberside	150	15.0%
	Northern Ireland	-
	Northwest	120	12.0%
	Scotland	140	14.0%
	Southeast	400	40.0%
	Southwest	60	6.0%
	Wales	30	3.0%
	West Midlands	-
	Yorkshire/Humberside	-
	Total	1000	100.0%
Social Grade	A	60	6.0%
	B	270	27.0%
	C1	380	38.0%
	C2	170	17.0%
	D	100	10.0%
	E	20	2.0%
	Total	1000	100.0%

Any suggestions will be really appreciate it.

Thanks in advance!!!!

Joan

COMPUTE SCRAMBLE=UNIFORM(1).

SORT CASES BY AGE SCRAMBLE.

IF $CASENUM=1 OR (LAG(age) NE age) Counter=1.

IF MISSING(Counter) Counter=LAG(Counter)+1.

COMPUTE Keeper=Age.

RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15).

SELECT IF (Counter LE Keeper).

FREQ Age.

mils

Maguin, Eugene

Re: random sample of cases by groups

Joan,

I don’t quite see what your three variables are. Age and social grade, yes. But, what’s the third? Your sample is 100% female. Next, how big is your sample supposed to be? 100? 500? And, do you want your sample to reproduce as closely as possible the age distribution and the social grade distribution OR the age by social grade crosstabulation?

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joan Casellas
Sent: Monday, March 19, 2012 1:57 PM
To: [hidden email]
Subject: FW: random sample of cases by groups

Hi Everyone,

		total
		1
		Count	Column N %
Gender	Female	1000	100.0%
Gender	Total	1000	100.0%
Age	18-24	400	40.0%
	25-34	290	29.0%
	35-44	180	18.0%
	45-54	100	10.0%
	55-64	20	2.0%
	65+	10	1.0%
	Total	1000	100.0%
Region	East Anglia	-
	East Midlands & West Midlands & East Anglia	100	10.0%
	Northern & Yorkshire/Humberside	150	15.0%
	Northern Ireland	-
	Northwest	120	12.0%
	Scotland	140	14.0%
	Southeast	400	40.0%
	Southwest	60	6.0%
	Wales	30	3.0%
	West Midlands	-
	Yorkshire/Humberside	-
	Total	1000	100.0%
Social Grade	A	60	6.0%
	B	270	27.0%
	C1	380	38.0%
	C2	170	17.0%
	D	100	10.0%
	E	20	2.0%
	Total	1000	100.0%

Any suggestions will be really appreciate it.

Thanks in advance!!!!

Joan

COMPUTE SCRAMBLE=UNIFORM(1).

SORT CASES BY AGE SCRAMBLE.

IF $CASENUM=1 OR (LAG(age) NE age) Counter=1.

IF MISSING(Counter) Counter=LAG(Counter)+1.

COMPUTE Keeper=Age.

RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15).

SELECT IF (Counter LE Keeper).

FREQ Age.

David Marso

Re: FW: random sample of cases by groups

Administrator

In reply to this post by mils

Joan,
Please clarify your question. Have you attempted to apply the code you quoted from my previous posting? What was the outcome? How many cases do you have to draw from. What are the joint distributions? Do you need to replicate the 3 way joint distribution of some hypothetical population?
HTH, D

Joan Casellas-2 wrote

Hi Everyone,

I was wondering if someone could help me . I’m trying to select a random sample base on three different variables (at the same time). So far I’ve got a syntax that does it based on one variable (see syntax below). The variables and the proportion I would need to use are described next:

total

1

Count

Column N %

Gender

Female

1000

100.0%

Total

1000

100.0%

Age

18-24

400

40.0%

25-34

290

29.0%

35-44

180

18.0%

45-54

100

10.0%

55-64

20

2.0%

65+

10

1.0%

Total

1000

100.0%

Region

East Anglia

-

East Midlands & West Midlands & East Anglia

100

10.0%

Northern & Yorkshire/Humberside

150

15.0%

Northern Ireland

-

Northwest

120

12.0%

Scotland

140

14.0%

Southeast

400

40.0%

Southwest

60

6.0%

Wales

30

3.0%

West Midlands

-

Yorkshire/Humberside

-

Total

1000

100.0%

Social Grade

A

60

6.0%

B

270

27.0%

C1

380

38.0%

C2

170

17.0%

D

100

10.0%

E

20

2.0%

Total

1000

100.0%

Any suggestions will be really appreciate it.

Thanks in advance!!!!

Joan

COMPUTE SCRAMBLE=UNIFORM(1).
SORT CASES BY AGE SCRAMBLE.
IF $CASENUM=1 OR (LAG(age) NE age) Counter=1.
IF MISSING(Counter) Counter=LAG(Counter)+1.
COMPUTE Keeper=Age.
RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15).

SELECT IF (Counter LE Keeper).
FREQ Age.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

MacGillivary Heather L

Automatic reply: FW: random sample of cases by groups

I am out of the office until March 28 and will reply to emails at that time.

Ifyou need immediate assistance, please contact Kay Gates 303-982-6565 or [hidden email]

Heather

John F Hall

Re: random sample of cases by groups

In reply to this post by mils

Joan

Not sure this is a question about SPSS. Looks like you’re trying to generate a sampling frame, but is this for a real project or is it just an exercise in SPSS? Is it for drawing a sample from an existing data set?

Normally this kind of thing is done using a table in which categories are nested. Sampling by government and research companies is done all the time using Postcode Address files, Electoral Registers, or other divisions (eg Polling Districts, sometimes ranked with Mosaic): government and independent surveys tend to use some form of probability sampling, market research will often use quota sampling (usually within small areas selected by probability). These days the whole process is automated.

You have too many categories in each section 6 x 6 x 7 = 252 cells, an average of 4 cases per cell. If you use your categories to generate a sampling table, some of the cells will have very few cases, even none. No-one is going to incur the expense of sending someone to interview one 65 year old in Wales! In practice a much smaller table would be needed.

The figures you give don’t look proportional to the UK population, but if you reduce the age groups to three (18-34,35-44,45+) your regions to two (Southeast and Other) and social grade to three (AB, C1-C2, DE) you get a more manageable 18 cells (about 55 cases per cell) A final decision would require judgment about the needs of the research, the practicalities of fieldwork, and the budget.

Just seen your mail to Gene, so will reply to that instead. If you need help off-line, it’s easier if I have the data editor: you have 3000 cases, but how many variables are there? Check me out on my site: I’m safe to deal with!

John Hall

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Skype: surveyresearcher1

Phone: (+33) (0) 2.33.45.91.47

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joan Casellas
Sent: 19 March 2012 18:57
To: [hidden email]
Subject: FW: random sample of cases by groups

Hi Everyone,

		total
		1
		Count	Column N %
Gender	Female	1000	100.0%
Gender	Total	1000	100.0%
Age	18-24	400	40.0%
	25-34	290	29.0%
	35-44	180	18.0%
	45-54	100	10.0%
	55-64	20	2.0%
	65+	10	1.0%
	Total	1000	100.0%
Region	East Anglia	-
	East Midlands & West Midlands & East Anglia	100	10.0%
	Northern & Yorkshire/Humberside	150	15.0%
	Northern Ireland	-
	Northwest	120	12.0%
	Scotland	140	14.0%
	Southeast	400	40.0%
	Southwest	60	6.0%
	Wales	30	3.0%
	West Midlands	-
	Yorkshire/Humberside	-
	Total	1000	100.0%
Social Grade	A	60	6.0%
	B	270	27.0%
	C1	380	38.0%
	C2	170	17.0%
	D	100	10.0%
	E	20	2.0%
	Total	1000	100.0%

Any suggestions will be really appreciate it.

Thanks in advance!!!!

Joan

COMPUTE SCRAMBLE=UNIFORM(1).

SORT CASES BY AGE SCRAMBLE.

IF $CASENUM=1 OR (LAG(age) NE age) Counter=1.

IF MISSING(Counter) Counter=LAG(Counter)+1.

COMPUTE Keeper=Age.

RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15).

SELECT IF (Counter LE Keeper).

FREQ Age.

Maguin, Eugene

Re: random sample of cases by groups

In reply to this post by mils

John, I did miss seeing that region was a dimension also. Thank you.

Joan,

Have you reviewed the sample command and rejected it?? If so, then:

There may well be better ways of doing this problem but here is my untested ‘first cut’. So, you have a three variable crosstab of age, region, and social grade (which I’ll hereafter call ‘sg’). The expected count in a cell of that crosstab is the product of the marginal proportions. For example: given age=18-24, region=wales, sg=A; the expected cell proportion would be 0.40*0.03*0.06=.00072. With an N of 1000 that’s a cell count of 0.72, call it 1.0.

So, in overview: for each case you need to figure out which cell of the crosstab it belongs to and then assign the cell the number of cases to be drawn for that cell (‘cell target’). Then draw a number (‘draw’) from a uniform distribution for each case, sort cases by cell id and draw, number the cases within cell id (‘cell case number’='ccn'), and keep cases such that cell case number is less than or equal to cell target.

I assume that age, region, and sg are numeric variables and there are no missing values.

String cellid(a4).

Compute cellid=concat(string(age,f1.0),string(region,f2.0),string(sg,f1.0)).

Recode age(1=0.40)… into agep. /* this is the variable's marginal proportions.

Recode region …. Into regionp.

Recode sg … into sgp.

Compute celltarget=rnd(1000*agep*region*sgp). /* you may want trunc instead of rnd.

* do a frequencies at this point to check that celltarget sums to 1000, which it may not due to rounding errors.

* Adjust as needed. When done.

Compute draw=uniform(1).

Sort cases by cellid draw.

Do if ($casenum eq 1 or cellid ne lag(cellid)).

+ compute ccn=1.

Else.

+ compute ccn=lag(ccn)+1.

End if.

Compute pick=0.

If (ccn le celltarget) pick=1.

Select if (pick eq 1).

Frequencies cellid.

Let me know if you have any syntax or logic errors and I'll respond to them.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joan Casellas
Sent: Monday, March 19, 2012 1:57 PM
To: [hidden email]
Subject: FW: random sample of cases by groups

Hi Everyone,

		total
		1
		Count	Column N %
Gender	Female	1000	100.0%
Gender	Total	1000	100.0%
Age	18-24	400	40.0%
	25-34	290	29.0%
	35-44	180	18.0%
	45-54	100	10.0%
	55-64	20	2.0%
	65+	10	1.0%
	Total	1000	100.0%
Region	East Anglia	-
	East Midlands & West Midlands & East Anglia	100	10.0%
	Northern & Yorkshire/Humberside	150	15.0%
	Northern Ireland	-
	Northwest	120	12.0%
	Scotland	140	14.0%
	Southeast	400	40.0%
	Southwest	60	6.0%
	Wales	30	3.0%
	West Midlands	-
	Yorkshire/Humberside	-
	Total	1000	100.0%
Social Grade	A	60	6.0%
	B	270	27.0%
	C1	380	38.0%
	C2	170	17.0%
	D	100	10.0%
	E	20	2.0%
	Total	1000	100.0%

Any suggestions will be really appreciate it.

Thanks in advance!!!!

Joan

COMPUTE SCRAMBLE=UNIFORM(1).

SORT CASES BY AGE SCRAMBLE.

IF $CASENUM=1 OR (LAG(age) NE age) Counter=1.

IF MISSING(Counter) Counter=LAG(Counter)+1.

COMPUTE Keeper=Age.

RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15).

SELECT IF (Counter LE Keeper).

FREQ Age.