Hi Everyone,
I was wondering if someone could help me . I’m trying to select a random sample base on three different variables (at the same time). So far I’ve got a syntax that does it based on one variable (see syntax below). The variables and the proportion I would need to use are described next:
Any suggestions will be really appreciate it.
Thanks in advance!!!!
Joan
COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY AGE SCRAMBLE. IF $CASENUM=1 OR (LAG(age) NE age) Counter=1. IF MISSING(Counter) Counter=LAG(Counter)+1. COMPUTE Keeper=Age. RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15).
SELECT IF (Counter LE Keeper). FREQ Age.
mils
|
Joan, I don’t quite see what your three variables are. Age and social grade, yes. But, what’s the third? Your sample is 100% female. Next, how big is your sample supposed to be? 100? 500? And, do you want your sample to reproduce as closely as possible the age distribution and the social grade distribution OR the age by social grade crosstabulation? Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joan Casellas Hi Everyone, I was wondering if someone could help me . I’m trying to select a random sample base on three different variables (at the same time). So far I’ve got a syntax that does it based on one variable (see syntax below). The variables and the proportion I would need to use are described next:
Any suggestions will be really appreciate it. Thanks in advance!!!! Joan COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY AGE SCRAMBLE. IF $CASENUM=1 OR (LAG(age) NE age) Counter=1. IF MISSING(Counter) Counter=LAG(Counter)+1. COMPUTE Keeper=Age. RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15). SELECT IF (Counter LE Keeper). FREQ Age. |
Administrator
|
In reply to this post by mils
Joan,
Please clarify your question. Have you attempted to apply the code you quoted from my previous posting? What was the outcome? How many cases do you have to draw from. What are the joint distributions? Do you need to replicate the 3 way joint distribution of some hypothetical population? HTH, D
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I am out of the office until March 28 and will reply to emails at that time.
Ifyou need immediate assistance, please contact Kay Gates 303-982-6565 or [hidden email] Heather |
In reply to this post by mils
Joan Not sure this is a question about SPSS. Looks like you’re trying to generate a sampling frame, but is this for a real project or is it just an exercise in SPSS? Is it for drawing a sample from an existing data set? Normally this kind of thing is done using a table in which categories are nested. Sampling by government and research companies is done all the time using Postcode Address files, Electoral Registers, or other divisions (eg Polling Districts, sometimes ranked with Mosaic): government and independent surveys tend to use some form of probability sampling, market research will often use quota sampling (usually within small areas selected by probability). These days the whole process is automated. You have too many categories in each section 6 x 6 x 7 = 252 cells, an average of 4 cases per cell. If you use your categories to generate a sampling table, some of the cells will have very few cases, even none. No-one is going to incur the expense of sending someone to interview one 65 year old in Wales! In practice a much smaller table would be needed. The figures you give don’t look proportional to the UK population, but if you reduce the age groups to three (18-34,35-44,45+) your regions to two (Southeast and Other) and social grade to three (AB, C1-C2, DE) you get a more manageable 18 cells (about 55 cases per cell) A final decision would require judgment about the needs of the research, the practicalities of fieldwork, and the budget. Just seen your mail to Gene, so will reply to that instead. If you need help off-line, it’s easier if I have the data editor: you have 3000 cases, but how many variables are there? Check me out on my site: I’m safe to deal with! John Hall Email: [hidden email] Website: www.surveyresearch.weebly.com Skype: surveyresearcher1 Phone: (+33) (0) 2.33.45.91.47 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joan Casellas Hi Everyone, I was wondering if someone could help me . I’m trying to select a random sample base on three different variables (at the same time). So far I’ve got a syntax that does it based on one variable (see syntax below). The variables and the proportion I would need to use are described next:
Any suggestions will be really appreciate it. Thanks in advance!!!! Joan COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY AGE SCRAMBLE. IF $CASENUM=1 OR (LAG(age) NE age) Counter=1. IF MISSING(Counter) Counter=LAG(Counter)+1. COMPUTE Keeper=Age. RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15). SELECT IF (Counter LE Keeper). FREQ Age. |
In reply to this post by mils
John, I did miss seeing that region was a dimension also. Thank you. Joan, Have you reviewed the sample command and rejected it?? If so, then: There may well be better ways of doing this problem but here is my untested ‘first cut’. So, you have a three variable crosstab of age, region, and social grade (which I’ll hereafter call ‘sg’). The expected count in a cell of that crosstab is the product of the marginal proportions. For example: given age=18-24, region=wales, sg=A; the expected cell proportion would be 0.40*0.03*0.06=.00072. With an N of 1000 that’s a cell count of 0.72, call it 1.0. So, in overview: for each case you need to figure out which cell of the crosstab it belongs to and then assign the cell the number of cases to be drawn for that cell (‘cell target’). Then draw a number (‘draw’) from a uniform distribution for each case, sort cases by cell id and draw, number the cases within cell id (‘cell case number’='ccn'), and keep cases such that cell case number is less than or equal to cell target. I assume that age, region, and sg are numeric variables and there are no missing values. String cellid(a4). Compute cellid=concat(string(age,f1.0),string(region,f2.0),string(sg,f1.0)). Recode age(1=0.40)… into agep. /* this is the variable's marginal proportions. Recode region …. Into regionp. Recode sg … into sgp. Compute celltarget=rnd(1000*agep*region*sgp). /* you may want trunc instead of rnd. * do a frequencies at this point to check that celltarget sums to 1000, which it may not due to rounding errors. * Adjust as needed. When done. Compute draw=uniform(1). Sort cases by cellid draw. Do if ($casenum eq 1 or cellid ne lag(cellid)). + compute ccn=1. Else. + compute ccn=lag(ccn)+1. End if. Compute pick=0. If (ccn le celltarget) pick=1. Select if (pick eq 1). Frequencies cellid. Let me know if you have any syntax or logic errors and I'll respond to them. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joan Casellas Hi Everyone, I was wondering if someone could help me . I’m trying to select a random sample base on three different variables (at the same time). So far I’ve got a syntax that does it based on one variable (see syntax below). The variables and the proportion I would need to use are described next:
Any suggestions will be really appreciate it. Thanks in advance!!!! Joan COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY AGE SCRAMBLE. IF $CASENUM=1 OR (LAG(age) NE age) Counter=1. IF MISSING(Counter) Counter=LAG(Counter)+1. COMPUTE Keeper=Age. RECODE Keeper (1=48)(2=100)(3=150)(4=125)(5=86)(6=15). SELECT IF (Counter LE Keeper). FREQ Age. |
Free forum by Nabble | Edit this page |