I have multiple variables over which I have to select my total sample. In other word, you can say best quota fit from the total sample. for example: I have total sample of 1100 out which I have to select 1000 sample based on 3 different variable Age, Gender, Region.
Gender: 500:500 Age: 250:300:300:150 Region: 334:333:333 Can anyone help me on this?Thanks in advance Gaurav |
If there are 1100 in the population why are you
taking a sample of 1000? The amount of work would not be that
different. Are you dropping cases to balance a design?
If you crosstab gender by age by region do you have 42 in each of the 24 cells? On the margins of the crosstab do you have enough cases? e.g., 250 in the first age group, 300 in the second, etc. Art Kendall Social Research Consultants On 1/10/2012 7:19 AM, GauravSrivastava wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDI have multiple variables over which I have to select my total sample. In other word, you can say best quota fit from the total sample. for example: I have total sample of 1100 out which I have to select 1000 sample based on 3 different variable Age, Gender, Region. Gender: 500:500 Age: 250:300:300:150 Region: 334:333:333 Can anyone help me on this?Thanks in advance Gaurav -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/extracting-Best-quota-fit-tp5133960p5133960.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
> Hello! I'm new to this server, but would like help with this, I had a
> similar case, you would have to do is generate a nested tabular > variables and see which is the frequency that gives each combination > and then assign a weighting factor those groups which give you the > base you want. > > Greetings and hope to serve you Yes. Depending on the difference between the obtained crosstab cell count and the desired cell count one could up-weight or down-weight to get the desired counts in the cells. The question remains: what is the purpose of the weighting? Where did the desired cell counts come from? From Google Translate: > Sí. Dependiendo de la diferencia entre el número de referencias > cruzadas de células obtenidas y el número de células deseada podría > elevar el peso o bajar de peso para obtener la cuenta de que desee en > las células. > La pregunta es: ¿cuál es el propósito de la ponderación? ¿Dónde está > el recuento de células deseadas vienen? Art Kendall Social Research Consultants On 1/10/2012 11:20 AM, Javier Figueroa wrote: > Hola! soy nuevo en este servidor, pero quisiera colaborar con esto, > por que tuve un caso igual, lo que tendrías que hacer es generar un > tabular con las variables anidadas y ver cual es la frecuencia que da > a cada combinación y luego asignarle un factor de ponderación a esos > grupos en la cual te diera la base que deseas. > > Saludos y espero te sirva ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Such, the purpose of weighting is that you ask, actually gave the information is that you want to have 1100 cases and 1000, need to assume that gender Male 500 and female 500 etc ... If I am reading correctly, what you are calling a "nested frequency distribution" is another name for a crosstab. Is that so? Two major reasons to weight are 1) (as you say) to make the sample more representative of the population from which it was drawn or 2) (rarely) to have a balanced design so that the effects are balanced in some comparative analysis. I would like to hear from the OP why (s)he wanted to sample 1000 from a pop of 1100 with those marginals on the crosstab. Since the sample is such a large proportion of the population why bother with sampling. The cost of data gathering and processing is only part of the total cost of a study. Is it possible that a sample of 1100 was drawn from a larger population and the OP wants those marginal counts to rake to totals? Si estoy leyendo correctamente, lo que usted llama una "distribución de frecuencias anidados" es otro nombre para una tabla de doble entrada. ¿Es así? Art Kendall Social Research Consultants On 1/11/2012 11:35 AM, Javier Figueroa wrote: Que tal, el propósito de la ponderación es el que pides, en realidad la información que diste es que tienes 1100 casos y quieres tener 1000, en genero asumo que necesitas tener Masculinos 500 y femeninos 500 etc...===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Hi,
Thanks for your reply. I have given a rough estimate of 1100 sample. Generally I used to have 2-3% of extra sample from exact one and I have to drop it after balancing the quota. But I am struggling with balancing quota for multiple variables at a time. Every time I did it manually in excel using crosstab. Can anyone suggest me with some sort of syntax through I can balanced it in single shot. Regards, Gaurav |
It sounds like you have population info that tells
you the proportion of the population that should be on the margins
of your weighted crosstab. (simply slide your decimal point to
the left). Is this what you are trying to do?
Your goal would be to up-weight or down-weight the cases in your sample of 1100 so that the marginals on the weighted crosstab are what you desire. If I understand your situation, there would be no reason to drop cases, just re-weight them. If I recall correctly, there is an extension command to rake to totals. check the archives of this list for "rake" or "raking". How discrepant are the marginal proportions from the 1100 cases from those you desire? Art Kendall Social Research Consultants On 1/11/2012 12:26 PM, GauravSrivastava wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDHi, Thanks for your reply. I have given a rough estimate of 1100 sample. Generally I used to have 2-3% of extra sample from exact one and I have to drop it after balancing the quota. But I am struggling with balancing quota for multiple variables at a time. Every time I did it manually in excel using crosstab. Can anyone suggest me with some sort of syntax through I can balanced it in single shot. Regards, Gaurav -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/extracting-Best-quota-fit-tp5133960p5137475.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by GauravSrivastava
Just tossing this into embers of the fire. YMMV. Probably best to WEIGHT rather than delete cases. This code downweights the 1100 to 1000 (WHY??) (delete the 1000/N_TOT) in the code if you opt to not down weight. *Simulation of your data*. NEW FILE. INPUT PROGRAM. LOOP CASE=1 TO 1100. COMPUTE G=TRUNC(UNIFORM(2)+1). COMPUTE A=TRUNC(UNIFORM(4)+1). COMPUTE R=TRUNC(UNIFORM(3)+1). LEAVE G A R CASE. END CASE. END LOOP. END FILE. END INPUT PROGRAM. **** GET YOUR RAW DATA AND GO for it. *I have used G A R for your variables gender age race so substitute as needed *. SORT CASES BY G A R. SAVE OUTFILE "SortedRawData.sav". AGGREGATE OUTFILE */ BREAK G A R / N_Obs=N. COMPUTE N_TOT=SUM(LAG(N_TOT),N_OBS). SORT CASES BY N_TOT(D). IF $CASENUM > 1 N_TOT=LAG(N_TOT). SORT CASES BY G A R. SAVE OUTFILE "Obs_table.sav" . * Create a table of desired proportions/counts *. NEW FILE. INPUT PROGRAM. LOOP G=1 TO 2. LOOP A= 1 TO 4. LOOP R= 1 TO 3. COMPUTE CELL=1. LEAVE G A R. END CASE. END LOOP. END LOOP. END LOOP. END FILE. END INPUT PROGRAM. RECODE G (1=.5)(2=.5) INTO P_G / A (1=.25)(2,3=.3)(4=.15) INTO P_A / R (1=.34)(2,3=.33) INTO P_R . COMPUTE P_C_DES = P_G * P_A * P_R . COMPUTE F_C_DES=P_C_DES*1000. SAVE OUTFILE "BaseTableProb.sav". MATCH FILES / FILE "Obs_table.sav" / FILE "BaseTableProb.sav" / BY G A R. COMPUTE P_OBS=N_OBS /N_TOT. COMPUTE C_WEIGHT=(P_C_DES /P_OBS) * (1000/N_TOT). MATCH FILES / FILE "SortedRawData.sav" /TABLE * / BY G A R. WEIGHT BY C_WEIGHT. FREQ G A R. CROSS / TABLE G BY A BY R /CELLS= ALL.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Upon a bit of diddling I deduce that the expression for C_WEIGHT can be simplified to:
COMPUTE C_WEIGHT=(P_G * P_A * P_R /N_OBS )*<Desired_N>. Plug in whatever for <Desired_N> I would suggest your original sample size!.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
HI,
i know its a little bit old topic, but nevertheless very interesting I tried the syntax and wondered how I can mark those who can be erased from the dataset instead of down weighting, in the old example the n=100. This code downweights the 1100 to 1000 (WHY??) (delete the 1000/N_TOT) in the code if you opt to not down weight. but I doesn`t quite get it... Many thanks for your help. |
Administrator
|
Probably best to open a new thread and carefully describe the specifics of your situation.
You replied to a 3 year old thread and didn't contextualize your question in any way. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |