SPSSX Discussion

extracting Best quota fit

Classic

List

Threaded

10 messages Options

GauravSrivastava

Jan 10, 2012; 12:19pm

extracting Best quota fit

I have multiple variables over which I have to select my total sample. In other word, you can say best quota fit from the total sample. for example: I have total sample of 1100 out which I have to select 1000 sample based on 3 different variable Age, Gender, Region.

Gender: 500:500

Age: 250:300:300:150

Region: 334:333:333

Can anyone help me on this?Thanks in advance

Gaurav

Art Kendall

Jan 10, 2012; 3:56pm

Re: extracting Best quota fit

If there are 1100 in the population why are you taking a sample of 1000? The amount of work would not be that different. Are you dropping cases to balance a design?

If you crosstab gender by age by region do you have 42 in each of the 24 cells?
On the margins of the crosstab do you have enough cases? e.g., 250 in the first age group, 300 in the second, etc.

Art Kendall
Social Research Consultants

On 1/10/2012 7:19 AM, GauravSrivastava wrote:

I have multiple variables over which I have to select my total sample. In
other word, you can say best quota fit from the total sample. for example: I
have total sample of 1100 out which I have to select 1000 sample based on 3
different variable Age, Gender, Region.

Gender: 500:500

Age: 250:300:300:150

Region: 334:333:333

Can anyone help me on this?Thanks in advance

Gaurav

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/extracting-Best-quota-fit-tp5133960p5133960.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Art Kendall

Jan 10, 2012; 9:51pm

Re: extracting Best quota fit

> Hello! I'm new to this server, but would like help with this, I had a
> similar case, you would have to do is generate a nested tabular
> variables and see which is the frequency that gives each combination
> and then assign a weighting factor those groups which give you the
> base you want.
>
> Greetings and hope to serve you

Yes. Depending on the difference between the obtained crosstab cell
count and the desired cell count one could up-weight or down-weight to
get the desired counts in the cells.
The question remains: what is the purpose of the weighting? Where did
the desired cell counts come from?

From Google Translate:
> Sí. Dependiendo de la diferencia entre el número de referencias
> cruzadas de células obtenidas y el número de células deseada podría
> elevar el peso o bajar de peso para obtener la cuenta de que desee en
> las células.
> La pregunta es: ¿cuál es el propósito de la ponderación? ¿Dónde está
> el recuento de células deseadas vienen?

Art Kendall
Social Research Consultants

On 1/10/2012 11:20 AM, Javier Figueroa wrote:
> Hola! soy nuevo en este servidor, pero quisiera colaborar con esto,
> por que tuve un caso igual, lo que tendrías que hacer es generar un
> tabular con las variables anidadas y ver cual es la frecuencia que da
> a cada combinación y luego asignarle un factor de ponderación a esos
> grupos en la cual te diera la base que deseas.
>
> Saludos y espero te sirva

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Art Kendall

Jan 11, 2012; 5:00pm

Re: extracting Best quota fit

Such, the purpose of weighting is that you ask, actually gave the information is that you want to have 1100 cases and 1000, need to assume that gender Male 500 and female 500 etc ...

With the nested frequency distribution of variables that need to combine know what your real world (which is what is your base), then for each combination you assign a factor in all calculations SPSS works with the sample you need and for your analysis is representative.

Greetings.
Javier Figueroa

If I am reading correctly, what you are calling a "nested frequency distribution" is another name for a crosstab. Is that so?

Two major reasons to weight are 1) (as you say) to make the sample more representative of the population from which it was drawn or 2) (rarely) to have a balanced design so that the effects are balanced in some comparative analysis.

I would like to hear from the OP why (s)he wanted to sample 1000 from a pop of 1100 with those marginals on the crosstab. Since the sample is such a large proportion of the population why bother with sampling. The cost of data gathering and processing is only part of the total cost of a study. Is it possible that a sample of 1100 was drawn from a larger population and the OP wants those marginal counts to rake to totals?

Si estoy leyendo correctamente, lo que usted llama una "distribución de frecuencias anidados" es otro nombre para una tabla de doble entrada. ¿Es así?

Dos razones principales de peso son: 1) (como usted dice) para hacer la muestra más representativa de la población de que se haya extraído o 2) (rara vez) tienen un diseño equilibrado para que los efectos se equilibran en un análisis comparativo.

Me gustaría saber de qué el PO (s) que quería muestra 1000 de un pop de 1100 con los marginales en la tabla de doble entrada. Dado que la muestra es una parte tan importante de la población ¿por qué molestarse con el muestreo. El costo de la recolección y procesamiento de datos es sólo una parte del costo total del estudio. ¿Es posible que una muestra de 1100 fue extraída de una población más grande y el PO quiere que los cargos al margen de comisión a los totales?

Art Kendall
Social Research Consultants

On 1/11/2012 11:35 AM, Javier Figueroa wrote:

Que tal, el propósito de la ponderación es el que pides, en realidad la información que diste es que tienes 1100 casos y quieres tener 1000, en genero asumo que necesitas tener Masculinos 500 y femeninos 500 etc...

Con la distribución de frecuencias anidadas con las variable que necesitas combinar sabrás cual es tu universo real (que es lo que tiene tu base), entonces a cada combinación le asignas un factor para que SPSS trabaje todos los cálculos con la muestra que necesitas y que para tu análisis sea representativo.

Saludos.

Javier Figueroa

2012/1/10 Art Kendall <[hidden email]>

Hello! I'm new to this server, but would like help with this, I had a similar case, you would have to do is generate a nested tabular variables and see which is the frequency that gives each combination and then assign a weighting factor those groups which give you the base you want.

Greetings and hope to serve you

Yes. Depending on the difference between the obtained crosstab cell count and the desired cell count one could up-weight or down-weight to get the desired counts in the cells.
The question remains: what is the purpose of the weighting? Where did the desired cell counts come from?

>From Google Translate:

Sí. Dependiendo de la diferencia entre el número de referencias cruzadas de células obtenidas y el número de células deseada podría elevar el peso o bajar de peso para obtener la cuenta de que desee en las células.
La pregunta es: ¿cuál es el propósito de la ponderación? ¿Dónde está el recuento de células deseadas vienen?

Art Kendall
Social Research Consultants

On 1/10/2012 11:20 AM, Javier Figueroa wrote:

Hola! soy nuevo en este servidor, pero quisiera colaborar con esto, por que tuve un caso igual, lo que tendrías que hacer es generar un tabular con las variables anidadas y ver cual es la frecuencia que da a cada combinación y luego asignarle un factor de ponderación a esos grupos en la cual te diera la base que deseas.

Saludos y espero te sirva

... [show rest of quote]

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
55453307 - 58293633

... [show rest of quote]

Art Kendall
Social Research Consultants

GauravSrivastava

Jan 11, 2012; 5:26pm

Re: extracting Best quota fit

Hi,

Thanks for your reply. I have given a rough estimate of 1100 sample. Generally I used to have 2-3% of extra sample from exact one and I have to drop it after balancing the quota. But I am struggling with balancing quota for multiple variables at a time. Every time I did it manually in excel using crosstab.
Can anyone suggest me with some sort of syntax through I can balanced it in single shot.

Regards,
Gaurav

Art Kendall

Jan 11, 2012; 6:30pm

Re: extracting Best quota fit

It sounds like you have population info that tells you the proportion of the population that should be on the margins of your weighted crosstab. (simply slide your decimal point to the left). Is this what you are trying to do?

Your goal would be to up-weight or down-weight the cases in your sample of 1100 so that the marginals on the weighted crosstab are what you desire. If I understand your situation, there would be no reason to drop cases, just re-weight them.

If I recall correctly, there is an extension command to rake to totals. check the archives of this list for "rake" or "raking".

How discrepant are the marginal proportions from the 1100 cases from those you desire?

Art Kendall
Social Research Consultants

On 1/11/2012 12:26 PM, GauravSrivastava wrote:

Hi,

Thanks for your reply. I have given a rough estimate of 1100 sample.
Generally I used to have 2-3% of extra sample from exact one and I have to
drop it after balancing the quota. But I am struggling with balancing quota
for multiple variables at a time. Every time I did it manually in excel
using crosstab.
Can anyone suggest me with some sort of syntax through I can balanced it in
single shot.

Regards,
Gaurav

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/extracting-Best-quota-fit-tp5133960p5137475.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

Art Kendall
Social Research Consultants

David Marso

Jan 20, 2012; 7:12pm

Re: extracting Best quota fit

Administrator

In reply to this post by GauravSrivastava

GauravSrivastava wrote

I have multiple variables over which I have to select my total sample. In other word, you can say best quota fit from the total sample. for example: I have total sample of 1100 out which I have to select 1000 sample based on 3 different variable Age, Gender, Region.

Gender: 500:500

Age: 250:300:300:150

Region: 334:333:333

Can anyone help me on this?Thanks in advance

Gaurav

Just tossing this into embers of the fire.
YMMV. Probably best to WEIGHT rather than delete cases.
This code downweights the 1100 to 1000 (WHY??)
(delete the 1000/N_TOT) in the code if you opt to not down weight.
*Simulation of your data*.
NEW FILE.
INPUT PROGRAM.
LOOP CASE=1 TO 1100.
COMPUTE G=TRUNC(UNIFORM(2)+1).
COMPUTE A=TRUNC(UNIFORM(4)+1).
COMPUTE R=TRUNC(UNIFORM(3)+1).
LEAVE G A R CASE.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
**** GET YOUR RAW DATA AND GO for it.
*I have used G A R for your variables gender age race so substitute as needed *.

SORT CASES BY G A R.
SAVE OUTFILE "SortedRawData.sav".
AGGREGATE OUTFILE */ BREAK G A R / N_Obs=N.
COMPUTE N_TOT=SUM(LAG(N_TOT),N_OBS).
SORT CASES BY N_TOT(D).
IF $CASENUM > 1 N_TOT=LAG(N_TOT).
SORT CASES BY G A R.
SAVE OUTFILE "Obs_table.sav" .

* Create a table of desired proportions/counts *.
NEW FILE.
INPUT PROGRAM.
LOOP G=1 TO 2.
LOOP A= 1 TO 4.
LOOP R= 1 TO 3.
COMPUTE CELL=1.
LEAVE G A R.
END CASE.
END LOOP.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.

RECODE G (1=.5)(2=.5) INTO P_G
/ A (1=.25)(2,3=.3)(4=.15) INTO P_A
/ R (1=.34)(2,3=.33) INTO P_R .
COMPUTE P_C_DES = P_G * P_A * P_R .
COMPUTE F_C_DES=P_C_DES*1000.
SAVE OUTFILE "BaseTableProb.sav".
MATCH FILES / FILE "Obs_table.sav" / FILE "BaseTableProb.sav" / BY G A R.
COMPUTE P_OBS=N_OBS /N_TOT.
COMPUTE C_WEIGHT=(P_C_DES /P_OBS) * (1000/N_TOT).
MATCH FILES / FILE "SortedRawData.sav" /TABLE * / BY G A R.
WEIGHT BY C_WEIGHT.
FREQ G A R.
CROSS / TABLE G BY A BY R /CELLS= ALL.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

David Marso

Jan 20, 2012; 11:41pm

Re: extracting Best quota fit

Administrator

Upon a bit of diddling I deduce that the expression for C_WEIGHT can be simplified to:
COMPUTE C_WEIGHT=(P_G * P_A * P_R /N_OBS )*<Desired_N>.
Plug in whatever for <Desired_N> I would suggest your original sample size!.

emma78

Apr 21, 2015; 7:15pm

Re: extracting Best quota fit

HI,
i know its a little bit old topic, but nevertheless very interesting

I tried the syntax and wondered how I can mark those who can be erased from the dataset instead of down weighting, in the old example the n=100.

This code downweights the 1100 to 1000 (WHY??)
(delete the 1000/N_TOT) in the code if you opt to not down weight. but I doesn`t quite get it...

Many thanks for your help.

David Marso

Apr 21, 2015; 8:19pm

Re: extracting Best quota fit

Administrator

Probably best to open a new thread and carefully describe the specifics of your situation.
You replied to a 3 year old thread and didn't contextualize your question in any way.
--

emma78 wrote

HI,
i know its a little bit old topic, but nevertheless very interesting

I tried the syntax and wondered how I can mark those who can be erased from the dataset instead of down weighting, in the old example the n=100.

This code downweights the 1100 to 1000 (WHY??)
(delete the 1000/N_TOT) in the code if you opt to not down weight. but I doesn`t quite get it...

Many thanks for your help.