Problem about modeling multinomial target

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem about modeling multinomial target

Chichi Shu
Dear Listers,
 
I’m trying to predict a multinomial target (three different values:  (0: not churn),  (1: voluntary churn) and (2: involuntary churn)).
 
The problem is the predictors /drivers/ independent variables driving an individual to score 1 or 2 will probably be very different.
 
If I use logistic regression, using stepwise method, will potentially key different independent variables cause any problems?
 
e.g. payment of last three month is a strong predictor for involuntary churn but it’s not correlated with voluntary churn at all. If I bring this payment information into the model as a variable to predict 1 or 2, will there be a problem?
 
Thanks!
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Art Kendall
Depending on the nature of youR predictors in addition to multinomial regression you might consider
to answer a question about what distinguishes the three groups
DISCRIMINANT
CATREG
TREES

More than one perspective on your data may produce useful information.

It may be the same variables but different values of those variables that differentially predict group membership.

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Bruce Weaver
Administrator
In reply to this post by Chichi Shu
You wrote:  "e.g. payment of last three month is a strong predictor for involuntary churn but it’s not correlated with voluntary churn at all. If I bring this payment information into the model as a variable to predict 1 or 2, will there be a problem?"

I take it no churn is the reference category for your outcome variable, and that where one scores on payment affects the odds ratio for involuntary relative to none, but not for voluntary relative to none.  I see no problem with that.  Why would you expect predictor variables to change the odds of both voluntary and involuntary in the same way?  If a model showed that for all predictor variables, I might wonder if some other predictors (that distinguish between voluntary and involuntary) were missing.

You also mentioned using stepwise selection.  Please look at Mike Babyak's nice article on over-fitting for some good discussion of why stepwise (& other similar algorithmic methods) is usually not a good idea.  Babyak also talks about the number of "events-per-variable" needed to avoid over-fitting in binary logistic regression.  I don't think he discusses multinomial logistic regression specifically, but you should be able to generalize to some extent from what he says about binomial logistic.  

http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf


HTH.


Chichi Shu wrote
Dear Listers,

I’m trying to predict a multinomial target (three different values:  (0: not churn),  (1: voluntary churn) and (2: involuntary churn)).

The problem is the predictors /drivers/ independent variables driving an individual to score 1 or 2 will probably be very different.

If I use logistic regression, using stepwise method, will potentially key different independent variables cause any problems?

e.g. payment of last three month is a strong predictor for involuntary churn but it’s not correlated with voluntary churn at all. If I bring this payment information into the model as a variable to predict 1 or 2, will there be a problem?

Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Chichi Shu
Thanks Art and Bruce.

I want the results to be very interpretable so regression methods are
preferred.

But if, say I bring payment of last three months in as an IV (Pmt) and there
will be a coefficient computed for it.

This coefficient will be statistically significant only for target "2" from
reference value 0. It shouldn't be statistically significant for target
value "1" from reference value of 0.

So wouldn't doing regression and come up with a single coefficient for a
variable be problematic in this case?

If so, is there any regression methodology overcoming it?

Thanks!

-----Original Message-----
From: Bruce Weaver
Sent: Sunday, December 07, 2014 8:42 AM
To: [hidden email]
Subject: Re: Problem about modeling multinomial target

You wrote:  "e.g. payment of last three month is a strong predictor for
involuntary churn but it’s not correlated with voluntary churn at all. If I
bring this payment information into the model as a variable to predict 1 or
2, will there be a problem?"

I take it no churn is the reference category for your outcome variable, and
that where one scores on payment affects the odds ratio for involuntary
relative to none, but not for voluntary relative to none.  I see no problem
with that.  Why would you expect predictor variables to change the odds of
both voluntary and involuntary in the same way?  If a model showed that for
all predictor variables, I might wonder if some other predictors (that
distinguish between voluntary and involuntary) were missing.

You also mentioned using stepwise selection.  Please look at Mike Babyak's
nice article on over-fitting for some good discussion of why stepwise (&
other similar algorithmic methods) is usually not a good idea.  Babyak also
talks about the number of "events-per-variable" needed to avoid over-fitting
in binary logistic regression.  I don't think he discusses multinomial
logistic regression specifically, but you should be able to generalize to
some extent from what he says about binomial logistic.

http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf


HTH.



Chichi Shu wrote

> Dear Listers,
>
> I’m trying to predict a multinomial target (three different values:  (0:
> not churn),  (1: voluntary churn) and (2: involuntary churn)).
>
> The problem is the predictors /drivers/ independent variables driving an
> individual to score 1 or 2 will probably be very different.
>
> If I use logistic regression, using stepwise method, will potentially key
> different independent variables cause any problems?
>
> e.g. payment of last three month is a strong predictor for involuntary
> churn but it’s not correlated with voluntary churn at all. If I bring
> this
> payment information into the model as a variable to predict 1 or 2, will
> there be a problem?
>
> Thanks!
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728132.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Bruce Weaver
Administrator
Your DV has 3 levels.  So there should be two coefficients for Pmt.  See the example here:

  http://www.ats.ucla.edu/stat/spss/output/mlogit.htm

HTH.


Chichi Shu wrote
Thanks Art and Bruce.

I want the results to be very interpretable so regression methods are
preferred.

But if, say I bring payment of last three months in as an IV (Pmt) and there
will be a coefficient computed for it.

This coefficient will be statistically significant only for target "2" from
reference value 0. It shouldn't be statistically significant for target
value "1" from reference value of 0.

So wouldn't doing regression and come up with a single coefficient for a
variable be problematic in this case?

If so, is there any regression methodology overcoming it?

Thanks!

-----Original Message-----
From: Bruce Weaver
Sent: Sunday, December 07, 2014 8:42 AM
To: [hidden email]
Subject: Re: Problem about modeling multinomial target

You wrote:  "e.g. payment of last three month is a strong predictor for
involuntary churn but it’s not correlated with voluntary churn at all. If I
bring this payment information into the model as a variable to predict 1 or
2, will there be a problem?"

I take it no churn is the reference category for your outcome variable, and
that where one scores on payment affects the odds ratio for involuntary
relative to none, but not for voluntary relative to none.  I see no problem
with that.  Why would you expect predictor variables to change the odds of
both voluntary and involuntary in the same way?  If a model showed that for
all predictor variables, I might wonder if some other predictors (that
distinguish between voluntary and involuntary) were missing.

You also mentioned using stepwise selection.  Please look at Mike Babyak's
nice article on over-fitting for some good discussion of why stepwise (&
other similar algorithmic methods) is usually not a good idea.  Babyak also
talks about the number of "events-per-variable" needed to avoid over-fitting
in binary logistic regression.  I don't think he discusses multinomial
logistic regression specifically, but you should be able to generalize to
some extent from what he says about binomial logistic.

http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf


HTH.



Chichi Shu wrote
> Dear Listers,
>
> I’m trying to predict a multinomial target (three different values:  (0:
> not churn),  (1: voluntary churn) and (2: involuntary churn)).
>
> The problem is the predictors /drivers/ independent variables driving an
> individual to score 1 or 2 will probably be very different.
>
> If I use logistic regression, using stepwise method, will potentially key
> different independent variables cause any problems?
>
> e.g. payment of last three month is a strong predictor for involuntary
> churn but it’s not correlated with voluntary churn at all. If I bring
> this
> payment information into the model as a variable to predict 1 or 2, will
> there be a problem?
>
> Thanks!
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728132.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Art Kendall
In reply to this post by Chichi Shu
A DISCRIMINANT is a specific instance of the General Linear Model (GLM) as is ordinary regression.
The weights for the function that distinguishes 1 vs 2 CAN be different from the function that distinguishes 1 vs 3. One strength of this "flipped sides" regression is that it gives a nice crosstab of predicted groups by actual groups.

To use David's term my eSPSS gives the impression that you are likely in
"What differentiates among (discriminates among, distinguishes among) the groups"?

Which instance of a General Linear Model  or a Generalized Linear Model depend on what questions you are asking and what the levels of measurement are for your variables.

CATREG (Categorical Regression) is designed to predict  categorical values from mixes of nominal, ordinal, and scale variables. It also allows you to see if it makes a difference which level  of measurement you use for a predictor.

It is also possible, that  you will want to include interactions among predictors.


As Bruce mentioned stepwise (as opposed to stepped) approaches are very questionable. They are in the same class of frequent abuses with committing the invidious median split.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Chichi Shu
In reply to this post by Bruce Weaver
Oh, Okay, I see that's what multinomial logistic regression for. Thanks!

-----Original Message-----
From: Bruce Weaver
Sent: Monday, December 08, 2014 10:58 AM
To: [hidden email]
Subject: Re: Problem about modeling multinomial target

Your DV has 3 levels.  So there should be two coefficients for Pmt.  See the
example here:

  http://www.ats.ucla.edu/stat/spss/output/mlogit.htm

HTH.



Chichi Shu wrote

> Thanks Art and Bruce.
>
> I want the results to be very interpretable so regression methods are
> preferred.
>
> But if, say I bring payment of last three months in as an IV (Pmt) and
> there
> will be a coefficient computed for it.
>
> This coefficient will be statistically significant only for target "2"
> from
> reference value 0. It shouldn't be statistically significant for target
> value "1" from reference value of 0.
>
> So wouldn't doing regression and come up with a single coefficient for a
> variable be problematic in this case?
>
> If so, is there any regression methodology overcoming it?
>
> Thanks!
>
> -----Original Message-----
> From: Bruce Weaver
> Sent: Sunday, December 07, 2014 8:42 AM
> To:

> SPSSX-L@.UGA

> Subject: Re: Problem about modeling multinomial target
>
> You wrote:  "e.g. payment of last three month is a strong predictor for
> involuntary churn but it’s not correlated with voluntary churn at all. If
> I
> bring this payment information into the model as a variable to predict 1
> or
> 2, will there be a problem?"
>
> I take it no churn is the reference category for your outcome variable,
> and
> that where one scores on payment affects the odds ratio for involuntary
> relative to none, but not for voluntary relative to none.  I see no
> problem
> with that.  Why would you expect predictor variables to change the odds of
> both voluntary and involuntary in the same way?  If a model showed that
> for
> all predictor variables, I might wonder if some other predictors (that
> distinguish between voluntary and involuntary) were missing.
>
> You also mentioned using stepwise selection.  Please look at Mike Babyak's
> nice article on over-fitting for some good discussion of why stepwise (&
> other similar algorithmic methods) is usually not a good idea.  Babyak
> also
> talks about the number of "events-per-variable" needed to avoid
> over-fitting
> in binary logistic regression.  I don't think he discusses multinomial
> logistic regression specifically, but you should be able to generalize to
> some extent from what he says about binomial logistic.
>
> http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf
>
>
> HTH.
>
>
>
> Chichi Shu wrote
>> Dear Listers,
>>
>> I’m trying to predict a multinomial target (three different values:  (0:
>> not churn),  (1: voluntary churn) and (2: involuntary churn)).
>>
>> The problem is the predictors /drivers/ independent variables driving an
>> individual to score 1 or 2 will probably be very different.
>>
>> If I use logistic regression, using stepwise method, will potentially key
>> different independent variables cause any problems?
>>
>> e.g. payment of last three month is a strong predictor for involuntary
>> churn but it’s not correlated with voluntary churn at all. If I bring
>> this
>> payment information into the model as a variable to predict 1 or 2, will
>> there be a problem?
>>
>> Thanks!
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver

> bweaver@

> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728132.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728134.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem about modeling multinomial target

Ryan
In reply to this post by Bruce Weaver
Just saw this post. It's worth noting that the OP could have also just fit two separate binary logistic regressions and obtained the same parameter estimates, standard errors, and p-values. Of course, an overall GOF statistic would be not available for the generalized logit (multinomial logistic regression) if two separate models were employed.

Ryan

On Mon, Dec 8, 2014 at 10:58 AM, Bruce Weaver <[hidden email]> wrote:
Your DV has 3 levels.  So there should be two coefficients for Pmt.  See the
example here:

  http://www.ats.ucla.edu/stat/spss/output/mlogit.htm

HTH.



Chichi Shu wrote
> Thanks Art and Bruce.
>
> I want the results to be very interpretable so regression methods are
> preferred.
>
> But if, say I bring payment of last three months in as an IV (Pmt) and
> there
> will be a coefficient computed for it.
>
> This coefficient will be statistically significant only for target "2"
> from
> reference value 0. It shouldn't be statistically significant for target
> value "1" from reference value of 0.
>
> So wouldn't doing regression and come up with a single coefficient for a
> variable be problematic in this case?
>
> If so, is there any regression methodology overcoming it?
>
> Thanks!
>
> -----Original Message-----
> From: Bruce Weaver
> Sent: Sunday, December 07, 2014 8:42 AM
> To:

> SPSSX-L@.UGA

> Subject: Re: Problem about modeling multinomial target
>
> You wrote:  "e.g. payment of last three month is a strong predictor for
> involuntary churn but it’s not correlated with voluntary churn at all. If
> I
> bring this payment information into the model as a variable to predict 1
> or
> 2, will there be a problem?"
>
> I take it no churn is the reference category for your outcome variable,
> and
> that where one scores on payment affects the odds ratio for involuntary
> relative to none, but not for voluntary relative to none.  I see no
> problem
> with that.  Why would you expect predictor variables to change the odds of
> both voluntary and involuntary in the same way?  If a model showed that
> for
> all predictor variables, I might wonder if some other predictors (that
> distinguish between voluntary and involuntary) were missing.
>
> You also mentioned using stepwise selection.  Please look at Mike Babyak's
> nice article on over-fitting for some good discussion of why stepwise (&
> other similar algorithmic methods) is usually not a good idea.  Babyak
> also
> talks about the number of "events-per-variable" needed to avoid
> over-fitting
> in binary logistic regression.  I don't think he discusses multinomial
> logistic regression specifically, but you should be able to generalize to
> some extent from what he says about binomial logistic.
>
> http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf
>
>
> HTH.
>
>
>
> Chichi Shu wrote
>> Dear Listers,
>>
>> I’m trying to predict a multinomial target (three different values:  (0:
>> not churn),  (1: voluntary churn) and (2: involuntary churn)).
>>
>> The problem is the predictors /drivers/ independent variables driving an
>> individual to score 1 or 2 will probably be very different.
>>
>> If I use logistic regression, using stepwise method, will potentially key
>> different independent variables cause any problems?
>>
>> e.g. payment of last three month is a strong predictor for involuntary
>> churn but it’s not correlated with voluntary churn at all. If I bring
>> this
>> payment information into the model as a variable to predict 1 or 2, will
>> there be a problem?
>>
>> Thanks!
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver

> bweaver@

> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728132.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728134.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD