SPSSX Discussion - Re: How to use weighed data for a generalized linear model (GzLM) analysis?

Re: How to use weighed data for a generalized linear model (GzLM) analysis?

Posted by Hector Maletta on Jul 09, 2012; 2:35pm
URL: http://spssx-discussion.165.s1.nabble.com/How-to-use-weighed-data-for-a-generalized-linear-model-GzLM-analysis-tp5714060p5714087.html

Melissa,
In fact the Complex Samples facility in SPSS has not yet been adapted to
handle Generalized Linear Models or Mixed Models. I hope they come around
with a solution for this soon, either in a new version or through the
Development Central.
The problem created by this does not arise from the impossibility of
applying correct weights that reproduce actual population shares for each
group: this ordinarily can be done, as you mention for your case.
The problem created by the impossibility of using the Complex Samples
facility arises in the case of sampling procedures that involve clustering.

Clustering is a most common device because it saves money. Instead of
selecting 1000 people from the whole population of the US (say, from a
master list of US inhabitants, should there be one), you ordinarily go in
stages: first stratify your population, say by State or Region; then select
a sample of counties in each State or Region (from a list of counties), then
select some communities within each selected county (from a list of
communities, which may be first stratified by size), then perhaps select
some ZIP codes within selected communities, and finally devise a procedure
to choose specific households within every selected ZIP code area. Each
stage (except stratification) is a clustering stage. Stratification reduces
standard error, clustering increases it.

If your sampling ratios have been variable (say, you sampled 10% of the ZIP
codes in some places, 1% in others, 20% in yet others), you should apply
weights that correct for this disproportionateness. These weights can be
inflationary (of the type Ni/ni) where frequencies in each case are inflated
to actual population size) or merely proportional (where the former are
divided by N, estimated size of TOTAL population covered by the study, so
that the weighted sample size is still n, and the average weight equals 1,
with some weights below and some above 1.

Inflationary weights are good for tables, because they show the actual size
of the population involved, but not good for obtaining significance measures
(because SPSS computes significance based on the WEIGHTED sample size). So,
non-inflationary weights should be used in order to compute significance
tests.

If you apply proportional weights that reproduce the relative population
weight of states, counties, communities and ZIP codes, you are correcting
for the different sampling ratios applied in the various stages. However,
this does not account for the extra margin of error created by clustering.
Once you selected county X, and did not select county Z, you were betting
your money on the results to be obtained from X, and "losing" any
interesting facts that may exist in Z but not in X. To use an extreme
example: suppose you do not sample all the States, but only some: instead of
drawing samples (of counties, communities, ZIP codes, households) in all
States (as would be recommendable), you randomly choose 2 States out of the
50. Of course your results would be very different if you happen to select,
say, California and NY versus another sample where you select Alaska and
Idaho. Even if your weights reflect the idea that the sample in Idaho and
Alaska and your other choices "represent" the population of the 50 states,
you are actually projecting the features of Alaskans and Idahoans onto the
citizens of the Big Apple and CA (and all other States). The same happens
when you select some counties in a State, and then project the results onto
all the counties of that state.

To correct for that "clustering error", you need to take account of the mean
and variance within and between clusters, and the sampling ratio of
clusters, and thus introduce a "design effect" into your weights. This is
done by Complex Samples all right.

The trouble is that you do not have as yet a facility to apply SPSS Complex
Samples to GzLM. Other softwares could do that, but this is a forum
concerning SPSS.

Beside the problems of using adequately proportional weights, and correcting
for clustering, a third problem arises with the Generalized Lineal Model
procedures in SPSS: they round off fractional weights to the nearest integer
BEFORE using them in the computations. If your weights are non inflationary,
they are usually fractional, some below and some above the average weight=1.

This problem is INDEPENDENT of the problems surrounding non proportional
sampling ratios and clustering. It is a mere computational device, that
distorts the weighting structure and eliminates all cases from relatively
oversampled areas (where the proportional weight is less than 0.5); if you
oversampled precisely those areas you were most interested in, you may be in
big trouble.

To sum up: one is in a triple fix in the current state of affairs.
1. You cannot use inflationary weights (which may be integer, or the
rounding may not be important) because it would underestimate sampling
error.
2. You could not use non-inflationary weights either, because they are
rounded off before they are actually used, thus rendering them useless.
3. In either case, you are still not correcting for the effect of
clustering, because a Complex Samples application for GzLM is not yet
available. Ignoring clustering understates your sampling error, in addition
to errors introduced by the other two problems.

Hope this clarifies the issue.

Hector
-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Melissa Ives
Enviado el: Monday, July 09, 2012 10:34
Para: [hidden email]
Asunto: Re: How to use weighed data for a generalized linear model (GzLM)
analysis?

Hector, can you say a bit more about #4--using complex samples facility? We
have version 20 and frequently compare a small group to a propensity
weighted and proportionally weighted--so the larger group is weighted to be
like the smaller group in terms of the propensity items and in terms of N
size. However it is impossible to compare the weighted groups using GLM due
to the issue you mention in #3 below - weights are rounded first--resulting
in either 0 or 1 values (since the weights we use range from 0-1.

Thanks,
Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Friday, July 06, 2012 9:54 PM
To: [hidden email]
Subject: Re: [SPSSX-L] How to use weighed data for a generalized linear
model (GzLM) analysis?

There are several different questions or problems involved here.
1. Are weights appropriate for estimating generalized linear models? Some
think they are not. I'm undecided on that. With a simple random sample I'd
say OK, do not apply any weights; with the usual case of disproportionate
(random) sampling using stratification and (worse still) clustering, I am
not sure.
2. If you use inflationary or frequency weights, SPSS would think your
sample size is the weighted sample size, which is larger than your actual
sample size, thus underestimating the standard error of your estimates.
3. If you use just proportional weighting, such that the weighted sample
size equals the unweighted sample size, which is an approximate solution
(solving for disproportionate sampling but not for clustering) you'd still
have a problem with SPSS generalized linear models (apart from the problem
for computing standard errors if your sample involves clustering). The
problem you'll have with SPSS is that Generalized linear models in SPSS have
the nasty habit of rounding the weights BEFORE using them (unlike other
procedures that apply rounding to the final result, i.e. the weighted
frequencies, not to each particular case weight). Proportional weights mean
that some weights are greater than 1 and others are lower than 1, with an
overall mean weight of 1 (because the weights do not alter sample size).
Thus any weight below 0.5 will be rounded to zero (causing you to "lose"
cases), weights between 0.5 and 1.5 will all be rounded to 1, those from 1.5
to 2.5 will be rounded to 2, and so on, thus defeating a large part of the
purpose of weighting.
4. You may think of an apparently clever solution: not using weights at all,
and applying instead the SPSS Complex Samples facility on an unweighted
dataset in order to compute standard errors. But in that case you'll have
another problem: up to the current version (v.20) SPSS Complex Samples does
not cover Generalized Linear Models or Mixed Models.
So you're in a fix, I guess. Sorry for not being more helpful.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Poes,
Matthew Joseph Enviado el: Friday, July 06, 2012 17:45
Para: [hidden email]
Asunto: Re: How to use weighed data for a generalized linear model (GzLM)
analysis?

I have not personally done this, but from what I have recently been told by
one of the IBM techs that frequents this forum, if you use the weight
variables for the scale weight, and the stratification variables for the
offset variable, this should effectively allow the use of weighted data.

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development University of Illinois 510
Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Sylvia
Sent: Friday, July 06, 2012 3:42 PM
To: [hidden email]
Subject: How to use weighed data for a generalized linear model (GzLM)
analysis?

I am working with a data set that uses geographically stratified sample
design and therefore needs to use weighted data to generate accurate
standard errors.
I was wondering if any of you have used weighed data for a generalized
linear model in SPSS and could help me with the know-hows.
Thanks a ton!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL
information and is intended only for the use of the addressee. If you are
not the designated recipient, or an employee or agent authorized to deliver
such transmittals to the designated recipient, you are hereby notified that
any dissemination, copying or publication of this transmittal is strictly
prohibited. If you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your system.
You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD