Logistic regression - choosing predictor variables

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Logistic regression - choosing predictor variables

Marc Feuerstein
Hello Folks,
I have basic statistics hands-on knowledge but am a beginner in the
predictive modelisation area.

I am executing an assignment for a company that has each year a long list of
prospects that they contact to try to convert them to paying customers. Some
of these prospects become clients. We have all the needed historic data
about these prospects and which ones become clients.

I am asked by this firm to use logistic regression techniques to estimate if
prospects of this company will likely convert to customers.

We have a discussion about which fields of data to use in the regression
model as predictors. Their sales planning department state that they have a
good opinion on which data fields will be good predictors. However, I would
like to challenge this statement.

Is there a statistic test that would indicate which variables will "likely"
be good predictors ? Something that would indicate if there is a good
"interaction" (is the term correctly employed) between the predictor
variables and the predicted variable ?

I know it's a wide topic, but if someone could give me some directions, it
would help me out !

Thanks,

Marc.
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression - choosing predictor variables

Beadle, ViAnn
Typically something like CART or CHAID techniques are used for this kind of analysis. The Tree Option to SPSS will do it and give you an "importance" rating for items in the model.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marc
Sent: Thursday, June 29, 2006 1:27 PM
To: [hidden email]
Subject: Logistic regression - choosing predictor variables

Hello Folks,
I have basic statistics hands-on knowledge but am a beginner in the
predictive modelisation area.

I am executing an assignment for a company that has each year a long list of
prospects that they contact to try to convert them to paying customers. Some
of these prospects become clients. We have all the needed historic data
about these prospects and which ones become clients.

I am asked by this firm to use logistic regression techniques to estimate if
prospects of this company will likely convert to customers.

We have a discussion about which fields of data to use in the regression
model as predictors. Their sales planning department state that they have a
good opinion on which data fields will be good predictors. However, I would
like to challenge this statement.

Is there a statistic test that would indicate which variables will "likely"
be good predictors ? Something that would indicate if there is a good
"interaction" (is the term correctly employed) between the predictor
variables and the predicted variable ?

I know it's a wide topic, but if someone could give me some directions, it
would help me out !

Thanks,

Marc.
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression - choosing predictor variables

Anthony Babinec
In reply to this post by Marc Feuerstein
Re Viann's suggestion to use CHAID or CART: If you
have many potential predictors, in some sense the
"important" predictors are those that enter the tree,
while the "unimportant" ones do not. For those that
enter the tree, there are some established ways
of reckoning individual predictor importance.

Within the logistic regression framework, there are
ways of assessing predictor importance. For this and
the related topic of variable selection, see Hosmer
and Lemeshow's "Applied Logistic Regression, 2nd edition."
In particular, Section 4.2 presents a good approach to
model-building in logistic regression. While the examples
are biomedical, Hosmer and Lemeshow can be profitably
used by those outside that field.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marc
Sent: Thursday, June 29, 2006 1:27 PM
To: [hidden email]
Subject: Logistic regression - choosing predictor variables

Hello Folks,
I have basic statistics hands-on knowledge but am a beginner in the
predictive modelisation area.

I am executing an assignment for a company that has each year a long list of
prospects that they contact to try to convert them to paying customers. Some
of these prospects become clients. We have all the needed historic data
about these prospects and which ones become clients.

I am asked by this firm to use logistic regression techniques to estimate if
prospects of this company will likely convert to customers.

We have a discussion about which fields of data to use in the regression
model as predictors. Their sales planning department state that they have a
good opinion on which data fields will be good predictors. However, I would
like to challenge this statement.

Is there a statistic test that would indicate which variables will "likely"
be good predictors ? Something that would indicate if there is a good
"interaction" (is the term correctly employed) between the predictor
variables and the predicted variable ?

I know it's a wide topic, but if someone could give me some directions, it
would help me out !

Thanks,

Marc.
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression - choosing predictor variables

Jamie Burnett
In reply to this post by Marc Feuerstein
There are problems with CHAID, as it looks at the interaction between
the predictor variables and is therefore confined in what predictors it
ends up with. There could be a lot of other significant predictors than
those that CHAID pulls out. Just by looking at the cross breaks of all
your possible predictors on the dependent (prospects become client vs
prospect doesn't become client) will give you an idea of possible
predicts correlated with the dependent. Then you could enter these into
a stepwise process availble in logistic regression, which will pick out
the strongest predictors.

Thanks

Jamie Burnett

Senior Statistician

Ipsos MORI

T    +44 20 7347 3338
F    +44 20 7347 3803
E    [hidden email]
W   www.ipsos-mori.com

79-81 Borough Road, London, SE1 1FY



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Anthony Babinec
Sent: 29 June 2006 20:19
To: [hidden email]
Subject: Re: Logistic regression - choosing predictor variables


Re Viann's suggestion to use CHAID or CART: If you
have many potential predictors, in some sense the
"important" predictors are those that enter the tree,
while the "unimportant" ones do not. For those that
enter the tree, there are some established ways
of reckoning individual predictor importance.

Within the logistic regression framework, there are
ways of assessing predictor importance. For this and
the related topic of variable selection, see Hosmer
and Lemeshow's "Applied Logistic Regression, 2nd edition."
In particular, Section 4.2 presents a good approach to model-building in
logistic regression. While the examples are biomedical, Hosmer and
Lemeshow can be profitably used by those outside that field.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marc
Sent: Thursday, June 29, 2006 1:27 PM
To: [hidden email]
Subject: Logistic regression - choosing predictor variables

Hello Folks,
I have basic statistics hands-on knowledge but am a beginner in the
predictive modelisation area.

I am executing an assignment for a company that has each year a long
list of prospects that they contact to try to convert them to paying
customers. Some of these prospects become clients. We have all the
needed historic data about these prospects and which ones become
clients.

I am asked by this firm to use logistic regression techniques to
estimate if prospects of this company will likely convert to customers.

We have a discussion about which fields of data to use in the regression
model as predictors. Their sales planning department state that they
have a good opinion on which data fields will be good predictors.
However, I would like to challenge this statement.

Is there a statistic test that would indicate which variables will
"likely" be good predictors ? Something that would indicate if there is
a good "interaction" (is the term correctly employed) between the
predictor variables and the predicted variable ?

I know it's a wide topic, but if someone could give me some directions,
it would help me out !

Thanks,

Marc.


============================
This e-mail and all attachments it may contain is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Ipsos MORI and its associated companies. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, printing, forwarding or copying of this e-mail is strictly prohibited. Please contact the sender if you have received this e-mail in error.
============================