Dear Listers,
I’m trying to predict a multinomial target (three different values:
(0: not churn), (1: voluntary churn) and (2: involuntary churn)).
The problem is the predictors /drivers/ independent variables driving an
individual to score 1 or 2 will probably be very different.
If I use logistic regression, using stepwise method, will potentially key
different independent variables cause any problems?
e.g. payment of last three month is a strong predictor for involuntary
churn but it’s not correlated with voluntary churn at all. If I bring this
payment information into the model as a variable to predict 1 or 2, will there
be a problem?
Thanks! |
Depending on the nature of youR predictors in addition to multinomial regression you might consider
to answer a question about what distinguishes the three groups DISCRIMINANT CATREG TREES More than one perspective on your data may produce useful information. It may be the same variables but different values of those variables that differentially predict group membership.
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Chichi Shu
You wrote: "e.g. payment of last three month is a strong predictor for involuntary churn but it’s not correlated with voluntary churn at all. If I bring this payment information into the model as a variable to predict 1 or 2, will there be a problem?"
I take it no churn is the reference category for your outcome variable, and that where one scores on payment affects the odds ratio for involuntary relative to none, but not for voluntary relative to none. I see no problem with that. Why would you expect predictor variables to change the odds of both voluntary and involuntary in the same way? If a model showed that for all predictor variables, I might wonder if some other predictors (that distinguish between voluntary and involuntary) were missing. You also mentioned using stepwise selection. Please look at Mike Babyak's nice article on over-fitting for some good discussion of why stepwise (& other similar algorithmic methods) is usually not a good idea. Babyak also talks about the number of "events-per-variable" needed to avoid over-fitting in binary logistic regression. I don't think he discusses multinomial logistic regression specifically, but you should be able to generalize to some extent from what he says about binomial logistic. http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Thanks Art and Bruce.
I want the results to be very interpretable so regression methods are preferred. But if, say I bring payment of last three months in as an IV (Pmt) and there will be a coefficient computed for it. This coefficient will be statistically significant only for target "2" from reference value 0. It shouldn't be statistically significant for target value "1" from reference value of 0. So wouldn't doing regression and come up with a single coefficient for a variable be problematic in this case? If so, is there any regression methodology overcoming it? Thanks! -----Original Message----- From: Bruce Weaver Sent: Sunday, December 07, 2014 8:42 AM To: [hidden email] Subject: Re: Problem about modeling multinomial target You wrote: "e.g. payment of last three month is a strong predictor for involuntary churn but it’s not correlated with voluntary churn at all. If I bring this payment information into the model as a variable to predict 1 or 2, will there be a problem?" I take it no churn is the reference category for your outcome variable, and that where one scores on payment affects the odds ratio for involuntary relative to none, but not for voluntary relative to none. I see no problem with that. Why would you expect predictor variables to change the odds of both voluntary and involuntary in the same way? If a model showed that for all predictor variables, I might wonder if some other predictors (that distinguish between voluntary and involuntary) were missing. You also mentioned using stepwise selection. Please look at Mike Babyak's nice article on over-fitting for some good discussion of why stepwise (& other similar algorithmic methods) is usually not a good idea. Babyak also talks about the number of "events-per-variable" needed to avoid over-fitting in binary logistic regression. I don't think he discusses multinomial logistic regression specifically, but you should be able to generalize to some extent from what he says about binomial logistic. http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf HTH. Chichi Shu wrote > Dear Listers, > > I’m trying to predict a multinomial target (three different values: (0: > not churn), (1: voluntary churn) and (2: involuntary churn)). > > The problem is the predictors /drivers/ independent variables driving an > individual to score 1 or 2 will probably be very different. > > If I use logistic regression, using stepwise method, will potentially key > different independent variables cause any problems? > > e.g. payment of last three month is a strong predictor for involuntary > churn but it’s not correlated with voluntary churn at all. If I bring > this > payment information into the model as a variable to predict 1 or 2, will > there be a problem? > > Thanks! > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728132.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Your DV has 3 levels. So there should be two coefficients for Pmt. See the example here:
http://www.ats.ucla.edu/stat/spss/output/mlogit.htm HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Chichi Shu
A DISCRIMINANT is a specific instance of the General Linear Model (GLM) as is ordinary regression.
The weights for the function that distinguishes 1 vs 2 CAN be different from the function that distinguishes 1 vs 3. One strength of this "flipped sides" regression is that it gives a nice crosstab of predicted groups by actual groups. To use David's term my eSPSS gives the impression that you are likely in "What differentiates among (discriminates among, distinguishes among) the groups"? Which instance of a General Linear Model or a Generalized Linear Model depend on what questions you are asking and what the levels of measurement are for your variables. CATREG (Categorical Regression) is designed to predict categorical values from mixes of nominal, ordinal, and scale variables. It also allows you to see if it makes a difference which level of measurement you use for a predictor. It is also possible, that you will want to include interactions among predictors. As Bruce mentioned stepwise (as opposed to stepped) approaches are very questionable. They are in the same class of frequent abuses with committing the invidious median split.
Art Kendall
Social Research Consultants |
In reply to this post by Bruce Weaver
Oh, Okay, I see that's what multinomial logistic regression for. Thanks!
-----Original Message----- From: Bruce Weaver Sent: Monday, December 08, 2014 10:58 AM To: [hidden email] Subject: Re: Problem about modeling multinomial target Your DV has 3 levels. So there should be two coefficients for Pmt. See the example here: http://www.ats.ucla.edu/stat/spss/output/mlogit.htm HTH. Chichi Shu wrote > Thanks Art and Bruce. > > I want the results to be very interpretable so regression methods are > preferred. > > But if, say I bring payment of last three months in as an IV (Pmt) and > there > will be a coefficient computed for it. > > This coefficient will be statistically significant only for target "2" > from > reference value 0. It shouldn't be statistically significant for target > value "1" from reference value of 0. > > So wouldn't doing regression and come up with a single coefficient for a > variable be problematic in this case? > > If so, is there any regression methodology overcoming it? > > Thanks! > > -----Original Message----- > From: Bruce Weaver > Sent: Sunday, December 07, 2014 8:42 AM > To: > SPSSX-L@.UGA > Subject: Re: Problem about modeling multinomial target > > You wrote: "e.g. payment of last three month is a strong predictor for > involuntary churn but it’s not correlated with voluntary churn at all. If > I > bring this payment information into the model as a variable to predict 1 > or > 2, will there be a problem?" > > I take it no churn is the reference category for your outcome variable, > and > that where one scores on payment affects the odds ratio for involuntary > relative to none, but not for voluntary relative to none. I see no > problem > with that. Why would you expect predictor variables to change the odds of > both voluntary and involuntary in the same way? If a model showed that > for > all predictor variables, I might wonder if some other predictors (that > distinguish between voluntary and involuntary) were missing. > > You also mentioned using stepwise selection. Please look at Mike Babyak's > nice article on over-fitting for some good discussion of why stepwise (& > other similar algorithmic methods) is usually not a good idea. Babyak > also > talks about the number of "events-per-variable" needed to avoid > over-fitting > in binary logistic regression. I don't think he discusses multinomial > logistic regression specifically, but you should be able to generalize to > some extent from what he says about binomial logistic. > > http://www.cs.vu.nl/~eliens/sg/local/theory/overfitting.pdf > > > HTH. > > > > Chichi Shu wrote >> Dear Listers, >> >> I’m trying to predict a multinomial target (three different values: (0: >> not churn), (1: voluntary churn) and (2: involuntary churn)). >> >> The problem is the predictors /drivers/ independent variables driving an >> individual to score 1 or 2 will probably be very different. >> >> If I use logistic regression, using stepwise method, will potentially key >> different independent variables cause any problems? >> >> e.g. payment of last three month is a strong predictor for involuntary >> churn but it’s not correlated with voluntary churn at all. If I bring >> this >> payment information into the model as a variable to predict 1 or 2, will >> there be a problem? >> >> Thanks! >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > > > > > ----- > -- > Bruce Weaver > bweaver@ > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728132.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-about-modeling-multinomial-target-tp5728130p5728134.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
Just saw this post. It's worth noting that the OP could have also just fit two separate binary logistic regressions and obtained the same parameter estimates, standard errors, and p-values. Of course, an overall GOF statistic would be not available for the generalized logit (multinomial logistic regression) if two separate models were employed. Ryan On Mon, Dec 8, 2014 at 10:58 AM, Bruce Weaver <[hidden email]> wrote: Your DV has 3 levels. So there should be two coefficients for Pmt. See the |
Free forum by Nabble | Edit this page |