Hello Folks,
I have basic statistics hands-on knowledge but am a beginner in the predictive modelisation area. I am executing an assignment for a company that has each year a long list of prospects that they contact to try to convert them to paying customers. Some of these prospects become clients. We have all the needed historic data about these prospects and which ones become clients. I am asked by this firm to use logistic regression techniques to estimate if prospects of this company will likely convert to customers. We have a discussion about which fields of data to use in the regression model as predictors. Their sales planning department state that they have a good opinion on which data fields will be good predictors. However, I would like to challenge this statement. Is there a statistic test that would indicate which variables will "likely" be good predictors ? Something that would indicate if there is a good "interaction" (is the term correctly employed) between the predictor variables and the predicted variable ? I know it's a wide topic, but if someone could give me some directions, it would help me out ! Thanks, Marc. |
Typically something like CART or CHAID techniques are used for this kind of analysis. The Tree Option to SPSS will do it and give you an "importance" rating for items in the model.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marc Sent: Thursday, June 29, 2006 1:27 PM To: [hidden email] Subject: Logistic regression - choosing predictor variables Hello Folks, I have basic statistics hands-on knowledge but am a beginner in the predictive modelisation area. I am executing an assignment for a company that has each year a long list of prospects that they contact to try to convert them to paying customers. Some of these prospects become clients. We have all the needed historic data about these prospects and which ones become clients. I am asked by this firm to use logistic regression techniques to estimate if prospects of this company will likely convert to customers. We have a discussion about which fields of data to use in the regression model as predictors. Their sales planning department state that they have a good opinion on which data fields will be good predictors. However, I would like to challenge this statement. Is there a statistic test that would indicate which variables will "likely" be good predictors ? Something that would indicate if there is a good "interaction" (is the term correctly employed) between the predictor variables and the predicted variable ? I know it's a wide topic, but if someone could give me some directions, it would help me out ! Thanks, Marc. |
In reply to this post by Marc Feuerstein
Re Viann's suggestion to use CHAID or CART: If you
have many potential predictors, in some sense the "important" predictors are those that enter the tree, while the "unimportant" ones do not. For those that enter the tree, there are some established ways of reckoning individual predictor importance. Within the logistic regression framework, there are ways of assessing predictor importance. For this and the related topic of variable selection, see Hosmer and Lemeshow's "Applied Logistic Regression, 2nd edition." In particular, Section 4.2 presents a good approach to model-building in logistic regression. While the examples are biomedical, Hosmer and Lemeshow can be profitably used by those outside that field. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marc Sent: Thursday, June 29, 2006 1:27 PM To: [hidden email] Subject: Logistic regression - choosing predictor variables Hello Folks, I have basic statistics hands-on knowledge but am a beginner in the predictive modelisation area. I am executing an assignment for a company that has each year a long list of prospects that they contact to try to convert them to paying customers. Some of these prospects become clients. We have all the needed historic data about these prospects and which ones become clients. I am asked by this firm to use logistic regression techniques to estimate if prospects of this company will likely convert to customers. We have a discussion about which fields of data to use in the regression model as predictors. Their sales planning department state that they have a good opinion on which data fields will be good predictors. However, I would like to challenge this statement. Is there a statistic test that would indicate which variables will "likely" be good predictors ? Something that would indicate if there is a good "interaction" (is the term correctly employed) between the predictor variables and the predicted variable ? I know it's a wide topic, but if someone could give me some directions, it would help me out ! Thanks, Marc. |
In reply to this post by Marc Feuerstein
There are problems with CHAID, as it looks at the interaction between
the predictor variables and is therefore confined in what predictors it ends up with. There could be a lot of other significant predictors than those that CHAID pulls out. Just by looking at the cross breaks of all your possible predictors on the dependent (prospects become client vs prospect doesn't become client) will give you an idea of possible predicts correlated with the dependent. Then you could enter these into a stepwise process availble in logistic regression, which will pick out the strongest predictors. Thanks Jamie Burnett Senior Statistician Ipsos MORI T +44 20 7347 3338 F +44 20 7347 3803 E [hidden email] W www.ipsos-mori.com 79-81 Borough Road, London, SE1 1FY -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Anthony Babinec Sent: 29 June 2006 20:19 To: [hidden email] Subject: Re: Logistic regression - choosing predictor variables Re Viann's suggestion to use CHAID or CART: If you have many potential predictors, in some sense the "important" predictors are those that enter the tree, while the "unimportant" ones do not. For those that enter the tree, there are some established ways of reckoning individual predictor importance. Within the logistic regression framework, there are ways of assessing predictor importance. For this and the related topic of variable selection, see Hosmer and Lemeshow's "Applied Logistic Regression, 2nd edition." In particular, Section 4.2 presents a good approach to model-building in logistic regression. While the examples are biomedical, Hosmer and Lemeshow can be profitably used by those outside that field. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marc Sent: Thursday, June 29, 2006 1:27 PM To: [hidden email] Subject: Logistic regression - choosing predictor variables Hello Folks, I have basic statistics hands-on knowledge but am a beginner in the predictive modelisation area. I am executing an assignment for a company that has each year a long list of prospects that they contact to try to convert them to paying customers. Some of these prospects become clients. We have all the needed historic data about these prospects and which ones become clients. I am asked by this firm to use logistic regression techniques to estimate if prospects of this company will likely convert to customers. We have a discussion about which fields of data to use in the regression model as predictors. Their sales planning department state that they have a good opinion on which data fields will be good predictors. However, I would like to challenge this statement. Is there a statistic test that would indicate which variables will "likely" be good predictors ? Something that would indicate if there is a good "interaction" (is the term correctly employed) between the predictor variables and the predicted variable ? I know it's a wide topic, but if someone could give me some directions, it would help me out ! Thanks, Marc. ============================ This e-mail and all attachments it may contain is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Ipsos MORI and its associated companies. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, printing, forwarding or copying of this e-mail is strictly prohibited. Please contact the sender if you have received this e-mail in error. ============================ |
Free forum by Nabble | Edit this page |