SPSSX Discussion

Logistic regression, few "respondents", and weighting in SPSS

Classic

List

Threaded

4 messages Options

Marc Feuerstein

Logistic regression, few "respondents", and weighting in SPSS

Dear all,
I'm coming with a question concerning logistic regression and SPSS.

We're in front of a situation here where we have very few "respondents" (1
in the field to predict) in a logistic regression. Only 1700 on 60000. I
think it's a situation called "sparsity", isn't it ?

When doing a logistic regression, we have a low fit. As I see it, it's
because of this sparse dataset. I was told that a way to solve that kind of
problem in LR, is to weight the responding cases, to "artificially" raise
their representativity in the dataset.

I've looked that up in the classical "bibles" of logistic regression
(Menard, Lemeshow, Jaccard), but haven't found any discussion of sparsity,
or situations with few respondents.

In SPPS, I know there is a weight feature. Does it work with logistic
regression ? Is it really a "technique" to (artificially) have a better fit
?

What modelling techniques are better suited for sparse datasets, in your
opinion ?

Thank you so much for helping out !

Marc.

Spousta Jan

Re: Logistic regression, few "respondents", and weighting in SPSS

Hi Marc,

>In SPPS, I know there is a weight feature. Does it work with logistic
regression ?
Yes, it works well together.

>Is it really a "technique" to (artificially) have a better fit ?
Yes, you get a "better" fit, but in a sense it is rather self-deception.
In reality, the fit is still bad and you cannot rely on the results.

>What modelling techniques are better suited for sparse datasets, in
your opinion ?
SPSS has its exact tests, they are devised for sparse data. Of course,
they cannot create significant results where there is nothing
significant.

Moreover, I do not understand your phrase "1700 on 60000" (sorry for my
bad English). If it means that you have 60,000 respondents and that 1700
of them has 1 in the dependent variable and the rest has 0 here, then
the case is not about sparsity. 1700 is enough for most practical
purposes and you can use logistic regression without desperation. If its
result is not significant, then it simply means that your "dependent"
variable does not depend on the selected predictors.

Hope this helps

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marc
Sent: Wednesday, July 26, 2006 8:39 AM
To: [hidden email]
Subject: Logistic regression, few "respondents", and weighting in SPSS

Dear all,
I'm coming with a question concerning logistic regression and SPSS.

We're in front of a situation here where we have very few "respondents"
(1 in the field to predict) in a logistic regression. Only 1700 on
60000. I think it's a situation called "sparsity", isn't it ?

When doing a logistic regression, we have a low fit. As I see it, it's
because of this sparse dataset. I was told that a way to solve that kind
of problem in LR, is to weight the responding cases, to "artificially"
raise their representativity in the dataset.

I've looked that up in the classical "bibles" of logistic regression
(Menard, Lemeshow, Jaccard), but haven't found any discussion of
sparsity, or situations with few respondents.

In SPPS, I know there is a weight feature. Does it work with logistic
regression ? Is it really a "technique" to (artificially) have a better
fit ?

What modelling techniques are better suited for sparse datasets, in your
opinion ?

Thank you so much for helping out !

Marc.

Hector Maletta

Re: Logistic regression, few "respondents", and weighting in SPSS

In reply to this post by Marc Feuerstein

You can of course inflate the weight of your cases but it is not a good idea
at all. When the probability of an event is low (1700 on 60000) that's tough
luck, but you cannot change it without disfiguring your data.
About lack of fit: one thing is lack of fit itself (Nagelkerke too low etc),
another is that the classification table does not predict most of the
events. The latter is because by default SPSS predicts an event when its
probability by logistic regression is over 0.50, which seldom happens when
the event is rare. It will probably predict "no event" (0) in all cases,
missing all the cases when the events actually happened.
On the other hand 1700 cases (or 60000 to be precise) are numerous enough
for the results being statistically significant. That is, whatever you find
will not be a sample fluke but (with 95% confidence) a true representation
of what happens at population level.
About weighting see my paper in the tutorials section of www.spsstools.net
(go to macros or syntax and then to tutorials).
Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Marc
Enviado el: Wednesday, July 26, 2006 3:39 AM
Para: [hidden email]
Asunto: Logistic regression, few "respondents", and weighting in SPSS

Dear all,
I'm coming with a question concerning logistic regression and SPSS.

We're in front of a situation here where we have very few "respondents" (1
in the field to predict) in a logistic regression. Only 1700 on 60000. I
think it's a situation called "sparsity", isn't it ?

When doing a logistic regression, we have a low fit. As I see it, it's
because of this sparse dataset. I was told that a way to solve that kind of
problem in LR, is to weight the responding cases, to "artificially" raise
their representativity in the dataset.

I've looked that up in the classical "bibles" of logistic regression
(Menard, Lemeshow, Jaccard), but haven't found any discussion of sparsity,
or situations with few respondents.

In SPPS, I know there is a weight feature. Does it work with logistic
regression ? Is it really a "technique" to (artificially) have a better fit
?

What modelling techniques are better suited for sparse datasets, in your
opinion ?

Thank you so much for helping out !

Marc.

Jason Burke

Re: Logistic regression, few "respondents", and weighting in SPSS

In reply to this post by Spousta Jan

Have you considered, splitting data into train / test partitions, then
combining all of the respondnts in your training partition with a
random samplle of the non-respondents in the same partition? With the
model, apply it against the test partition.

Jason

On 7/26/06, Spousta Jan <[hidden email]> wrote:

> Hi Marc,
>
> >In SPPS, I know there is a weight feature. Does it work with logistic
> regression ?
> Yes, it works well together.
>
> >Is it really a "technique" to (artificially) have a better fit ?
> Yes, you get a "better" fit, but in a sense it is rather self-deception.
> In reality, the fit is still bad and you cannot rely on the results.
>
> >What modelling techniques are better suited for sparse datasets, in
> your opinion ?
> SPSS has its exact tests, they are devised for sparse data. Of course,
> they cannot create significant results where there is nothing
> significant.
>
> Moreover, I do not understand your phrase "1700 on 60000" (sorry for my
> bad English). If it means that you have 60,000 respondents and that 1700
> of them has 1 in the dependent variable and the rest has 0 here, then
> the case is not about sparsity. 1700 is enough for most practical
> purposes and you can use logistic regression without desperation. If its
> result is not significant, then it simply means that your "dependent"
> variable does not depend on the selected predictors.
>
> Hope this helps
>
> Jan
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Marc
> Sent: Wednesday, July 26, 2006 8:39 AM
> To: [hidden email]
> Subject: Logistic regression, few "respondents", and weighting in SPSS
>
> Dear all,
> I'm coming with a question concerning logistic regression and SPSS.
>
> We're in front of a situation here where we have very few "respondents"
> (1 in the field to predict) in a logistic regression. Only 1700 on
> 60000. I think it's a situation called "sparsity", isn't it ?
>
> When doing a logistic regression, we have a low fit. As I see it, it's
> because of this sparse dataset. I was told that a way to solve that kind
> of problem in LR, is to weight the responding cases, to "artificially"
> raise their representativity in the dataset.
>
> I've looked that up in the classical "bibles" of logistic regression
> (Menard, Lemeshow, Jaccard), but haven't found any discussion of
> sparsity, or situations with few respondents.
>
> In SPPS, I know there is a weight feature. Does it work with logistic
> regression ? Is it really a "technique" to (artificially) have a better
> fit ?
>
> What modelling techniques are better suited for sparse datasets, in your
> opinion ?
>
> Thank you so much for helping out !
>
> Marc.
>