logistic regression with zero values

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

logistic regression with zero values

Greg
Hi everyone,

I have two income variables: the first one excludes the zero values, and the other income variable includes the zeros values.

When I run a log reg, all the dummies for the variable with the non-zeros are significant; whereas the one with the zeros included are all non-significant.

Why is this the case? Should I exclude the non-zero values then? I'd greatly appreciate any advice and/or suggestions.

Regards,
Grigoris Argeros
PhD Candidate in Sociology
Fordham University
Reply | Threaded
Open this post in threaded view
|

Re: logistic regression with zero values

Hector Maletta
Grigoris, your explanation is not completely clear. I presume you use income
as a predictor for some dichotomous outcome, but your income variable is not
an interval variable; you may have a variable defined as income brackets,
converted into a series of dummies. This series of dummies may or may not
include cases with zero income.
The decision about using or not using zero income cases depends on what
these cases mean. They may represent authentic cases of zero income, or
simply cases where the information about income is missing.
Similar to the case of the missing information is the case in which the
definition of income is too narrow. For instance, in US household surveys
"income" used to be defined in such a way that remittances or family help
did not count as income, and therefore a parent-supported student living
alone (say the young G.W. Bush during his Yale years) appeared as a
one-person household without an income, and is thus classified as below the
poverty line. Likewise for people living off savings, student loans or
remittances.
If a zero income represents cases of missing information (there is an income
but it is not reported), or it reflects a definition of income that is too
narrow, then those cases should be excluded from the sample, because you do
not know the actual income. Instead, if those cases represent cases actually
without an income, they should be kept in the study.
Who may really be without an income? Hardly a household, if "income" is
properly defined, but if the study is about individual persons, workers and
not workers, there would of course be people not earning any income.
Among workers, it is perfectly possible to have unpaid workers (e.g. family
help) not getting any monetary income for their efforts. But even in this
case, however, there is an indirect income; it can be estimated from the
production side since unpaid family contribute to the revenue at the family
farm or shop, and some imputation might be used to figure out how much
revenue is generated by that unpaid work. In the opposite calculation from
the income side, there is also an income to the worker in the form of
sharing in family consumption (shelter, food, clothing, etc., the value of
which could be estimated. (A usual shortcut is assigning unpaid workers the
going wage rate for their kind of work).
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
grigoris
Sent: 01 July 2009 14:39
To: [hidden email]
Subject: logistic regression with zero values

Hi everyone,

I have two income variables: the first one excludes the zero values, and the
other income variable includes the zeros values.

When I run a log reg, all the dummies for the variable with the non-zeros
are significant; whereas the one with the zeros included are all
non-significant.

Why is this the case? Should I exclude the non-zero values then? I'd greatly
appreciate any advice and/or suggestions.

Regards,
Grigoris Argeros
PhD Candidate in Sociology
Fordham University
--
View this message in context:
http://www.nabble.com/logistic-regression-with-zero-values-tp24294074p242940
74.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: logistic regression with zero values

David C-4
To add to Hector's comments: I've seen where zero (or negative) income
is reported by survey respondents in secondary data, but... of course
that's not accurate (in the sense they have no income). In looking at
other parts of the data, the person has assets and other indicators
that they imply that they aren't, ummm, "the idle rich". What can it
be? In the U.S., it's probably net income after deductions (including
losses) reported to the Federal government. This also includes cases
of some people who are what we would measure as below the poverty line
(i.e., people with low income, but they are clearly not in that
category based on a look at other responses in their case).

[Of course, why they would report it like this in a survey is a
discussion for another time/place...]

David Chapman, PhD
[hidden email]




On Wed, Jul 1, 2009 at 2:31 PM, Hector Maletta<[hidden email]> wrote:

> Grigoris, your explanation is not completely clear. I presume you use income
> as a predictor for some dichotomous outcome, but your income variable is not
> an interval variable; you may have a variable defined as income brackets,
> converted into a series of dummies. This series of dummies may or may not
> include cases with zero income.
> The decision about using or not using zero income cases depends on what
> these cases mean. They may represent authentic cases of zero income, or
> simply cases where the information about income is missing.
> Similar to the case of the missing information is the case in which the
> definition of income is too narrow. For instance, in US household surveys
> "income" used to be defined in such a way that remittances or family help
> did not count as income, and therefore a parent-supported student living
> alone (say the young G.W. Bush during his Yale years) appeared as a
> one-person household without an income, and is thus classified as below the
> poverty line. Likewise for people living off savings, student loans or
> remittances.
> If a zero income represents cases of missing information (there is an income
> but it is not reported), or it reflects a definition of income that is too
> narrow, then those cases should be excluded from the sample, because you do
> not know the actual income. Instead, if those cases represent cases actually
> without an income, they should be kept in the study.
> Who may really be without an income? Hardly a household, if "income" is
> properly defined, but if the study is about individual persons, workers and
> not workers, there would of course be people not earning any income.
> Among workers, it is perfectly possible to have unpaid workers (e.g. family
> help) not getting any monetary income for their efforts. But even in this
> case, however, there is an indirect income; it can be estimated from the
> production side since unpaid family contribute to the revenue at the family
> farm or shop, and some imputation might be used to figure out how much
> revenue is generated by that unpaid work. In the opposite calculation from
> the income side, there is also an income to the worker in the form of
> sharing in family consumption (shelter, food, clothing, etc., the value of
> which could be estimated. (A usual shortcut is assigning unpaid workers the
> going wage rate for their kind of work).
> Hector
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> grigoris
> Sent: 01 July 2009 14:39
> To: [hidden email]
> Subject: logistic regression with zero values
>
> Hi everyone,
>
> I have two income variables: the first one excludes the zero values, and the
> other income variable includes the zeros values.
>
> When I run a log reg, all the dummies for the variable with the non-zeros
> are significant; whereas the one with the zeros included are all
> non-significant.
>
> Why is this the case? Should I exclude the non-zero values then? I'd greatly
> appreciate any advice and/or suggestions.
>
> Regards,
> Grigoris Argeros
> PhD Candidate in Sociology
> Fordham University
> --
> View this message in context:
> http://www.nabble.com/logistic-regression-with-zero-values-tp24294074p242940
> 74.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD