SPSSX Discussion

Weighting

Classic

List

Threaded

2 messages Options

Kettlitz, Robert

Weighting

Good Morning,
I have a question about weighting a demographic variable and the effect it will have on the accuracy (95% confidence level). I have a data set composed of 83 men and 160 women, N=244. Would you recommend weighting gender. If your answer is yes how would I do that. When I used the weighting function it doubled the responses for women. Bob

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

jmdpulido

Re: Weighting

You can compute a weighting function that do not double the total N. In fact, if you compute a vector of weights equal to:

men = (0.5*243)/83
women = (0.5*243)/160

the synthaxis will be

compute weight1 = 0 .
exe .
if (sex=men) weight1 = ((0.5*243)/83) .
if (sex=women) weight1 = ((0.5*243)/160) .
exe .

Of course, take into account that this weight vector assumes that in your real population the proportion of males and females is equal (50%). If this is not true, you could change the weight vector to take this into account. For example if in the population you should analyse through your sample the proportion of males in only 40%, then you should compute the weight1 for men as 0.4*243/83, and for women as 0.6*243/160.

Note also than the lenght of the confidence intervals is sensible to the weighting. Thus, the weighting I propose keep your total sample in 243, then for total proportions your estimates for the confidence intervals and standard errors will be OK.

Nevertheless, if you would like to compute estimates for the standard error on conditional probabilities given than someone is male (e.g. percentage of men who smoke), then your standard errors are going to be smaller than they should, because they are using the weighted n for men in the denominator, which is bigger than the amount of information you have in the sample).

Recall that the standard error for proportion is equal to sqrt(p*(1-p)/n(j)). Thus, the standard error of this proportion depends on your sample size. If you are making inferences about a percentage of a subsample (e.g. % of males who visited a doctor), the relevant n(j) for the standard error is the number of males you really have, not the weighted number of males.

Hope this helps.
J. Pulido
PhD Student