Correlation and logistic regression

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Correlation and logistic regression

Peter Spangler
I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor.
The DV is coded for yes/no to top decile membership for variable
annual_spend. The IV is initial_purchase. Annual_spend and
initial_purchase have a spearman correlation of .66, however,
initial_purchase is a negative b = -.05 in the logistic regression
output. I understand that a correlation is a follow up test to a
logistic regression. What could be occurring that positively
correlated variables could show a negative relationship when
predicting the top decile of one of them?

My data:

DV IV
0    10
1      25
0     5
1     18
1      40

Sent from my iPhone

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Ryan
Responses below

On Fri, Jan 10, 2014 at 11:00 AM, Peter Spangler <[hidden email]> wrote:
I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor.

I assume you are referring to a continuous predictor.
 
The DV is coded for yes/no to top decile membership for variable
annual_spend.

I am generally not in favor of cutting up data without very good reason. 
 
The IV is initial_purchase. Annual_spend and
initial_purchase have a spearman correlation of .66, however,
initial_purchase is a negative b = -.05 in the logistic regression
output.

I don't see that with the sample data you provided below.
 
I understand that a correlation is a follow up test to a
logistic regression.

That is not routine for me.
 
What could be occurring that positively
correlated variables could show a negative relationship when
predicting the top decile of one of them?

Depending on how you cut the data could certainly affect the association between two variables.

Why don't you provide the actual data in SPSS syntax form for us to examine?...

DATA LIST list /x1 x2.
BEGIN DATA
0 10
1 25
0  5
1 18
1 40
END DATA.

Ryan


My data:

DV IV
0    10
1      25
0     5
1     18
1      40

Sent from my iPhone

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Art Kendall
In reply to this post by Peter Spangler
That approach throws away a lot of cases.  It also is an extreme coarsening of the data.

Do you have at least several hundred cases?

I suggest that you first  use
RANK with /ntiles=10.
then scatterplot the raw and coarsened DVs vs the IV.
try fitting  linear and loess curves in the graph editor in the output window. What does the difference in the two fits suggest to you?

try a set of 10 scatterplots or the raw DV and fit them with linear regressions.

if it looks fruitful, coarsen the dv to quintiles  via RANK and try quantile regression.
https://www.ibm.com/developerworks/community/files/app?lang=en#/file/bdd6814d-0386-4626-8efb-cab328c65066

Art Kendall
Social Research Consultants
On 1/10/2014 11:09 AM, Peter Spangler [via SPSSX Discussion] wrote:
I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor.
The DV is coded for yes/no to top decile membership for variable
annual_spend. The IV is initial_purchase. Annual_spend and
initial_purchase have a spearman correlation of .66, however,
initial_purchase is a negative b = -.05 in the logistic regression
output. I understand that a correlation is a follow up test to a
logistic regression. What could be occurring that positively
correlated variables could show a negative relationship when
predicting the top decile of one of them?

My data:

DV IV
0    10
1      25
0     5
1     18
1      40

Sent from my iPhone

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Correlation-and-logistic-regression-tp5723871.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Rich Ulrich
In reply to this post by Peter Spangler
You are predicting from two highly correlated variables,
where the criterion is "the top decile of one of them."

Well, any time you are predicting from two highly
correlated variables, you have a good risk that one of
them will behave as a "suppressor variable" - which you
can read about.

You might have the simple version here: What would be
two predictors that are both positive would be "initial"
and "subsequent" spending, where "subsequent" is the
difference between Annual and Initial.  Your equation is
generating some estimate of the importance of Subsequent
by using a negative coefficient to imply the difference.

I do agree that when looking at multiple predictors, it is
always advisable to consider at the correlations among predictors,
in addition to looking at the univariate predictions, before
drawing conclusions about the joint prediction. The poster who
suggested otherwise sounds naive to me.

--
Rich Ulrich

----------------------------------------

> Date: Fri, 10 Jan 2014 08:00:27 -0800
> From: [hidden email]
> Subject: Correlation and logistic regression
> To: [hidden email]
>
> I am trying to understand a binary logistic regression output where my
> scale variable is a negative predictor.
> The DV is coded for yes/no to top decile membership for variable
> annual_spend. The IV is initial_purchase. Annual_spend and
> initial_purchase have a spearman correlation of .66, however,
> initial_purchase is a negative b = -.05 in the logistic regression
> output. I understand that a correlation is a follow up test to a
> logistic regression. What could be occurring that positively
> correlated variables could show a negative relationship when
> predicting the top decile of one of them?
>
> My data:
>
> DV IV
> 0 10
> 1 25
> 0 5
> 1 18
> 1 40
...

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Peter Spangler
In reply to this post by Art Kendall
Thanks, Art, I will begin with the scatterplots and see about fitting the groups at quintiles

I definitely agree, this coarsening of the data was under the direction of a branding consultant. I typically begin exploring potential groups with clustering however this was not the case.

Yes, I am working with a dataset of 500 cases. These 500 cases are a sample of a primary dataset of 13.5 million cases.



On Sat, Jan 11, 2014 at 5:48 AM, Art Kendall <[hidden email]> wrote:
That approach throws away a lot of cases.  It also is an extreme coarsening of the data.

Do you have at least several hundred cases?

I suggest that you first  use
RANK with /ntiles=10.
then scatterplot the raw and coarsened DVs vs the IV.
try fitting  linear and loess curves in the graph editor in the output window. What does the difference in the two fits suggest to you?

try a set of 10 scatterplots or the raw DV and fit them with linear regressions.

if it looks fruitful, coarsen the dv to quintiles  via RANK and try quantile regression.
https://www.ibm.com/developerworks/community/files/app?lang=en#/file/bdd6814d-0386-4626-8efb-cab328c65066

Art Kendall
Social Research Consultants
On 1/10/2014 11:09 AM, Peter Spangler [via SPSSX Discussion] wrote:
I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor.
The DV is coded for yes/no to top decile membership for variable
annual_spend. The IV is initial_purchase. Annual_spend and
initial_purchase have a spearman correlation of .66, however,
initial_purchase is a negative b = -.05 in the logistic regression
output. I understand that a correlation is a follow up test to a
logistic regression. What could be occurring that positively
correlated variables could show a negative relationship when
predicting the top decile of one of them?

My data:

DV IV
0    10
1      25
0     5
1     18
1      40

Sent from my iPhone

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Correlation-and-logistic-regression-tp5723871.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants


View this message in context: Re: Correlation and logistic regression
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Peter Spangler
In reply to this post by Rich Ulrich
The supressor effect is something I just discovered last night but I'm not sure I understand it: Since I am predicting a effect of the difference between "initial" and "subsequent" the beta is a negative indicator because the difference is negative? Though the predictor "initial" is negative, it does correctly predict  33% of the Top Decile group. 


On Sat, Jan 11, 2014 at 9:23 AM, Rich Ulrich <[hidden email]> wrote:
You are predicting from two highly correlated variables,
where the criterion is "the top decile of one of them."

Well, any time you are predicting from two highly
correlated variables, you have a good risk that one of
them will behave as a "suppressor variable" - which you
can read about.

You might have the simple version here: What would be
two predictors that are both positive would be "initial"
and "subsequent" spending, where "subsequent" is the
difference between Annual and Initial.  Your equation is
generating some estimate of the importance of Subsequent
by using a negative coefficient to imply the difference.

I do agree that when looking at multiple predictors, it is
always advisable to consider at the correlations among predictors,
in addition to looking at the univariate predictions, before
drawing conclusions about the joint prediction. The poster who
suggested otherwise sounds naive to me.

--
Rich Ulrich

----------------------------------------
> Date: Fri, 10 Jan 2014 08:00:27 -0800
> From: [hidden email]
> Subject: Correlation and logistic regression
> To: [hidden email]
>
> I am trying to understand a binary logistic regression output where my
> scale variable is a negative predictor.
> The DV is coded for yes/no to top decile membership for variable
> annual_spend. The IV is initial_purchase. Annual_spend and
> initial_purchase have a spearman correlation of .66, however,
> initial_purchase is a negative b = -.05 in the logistic regression
> output. I understand that a correlation is a follow up test to a
> logistic regression. What could be occurring that positively
> correlated variables could show a negative relationship when
> predicting the top decile of one of them?
>
> My data:
>
> DV IV
> 0 10
> 1 25
> 0 5
> 1 18
> 1 40
...                                      

Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Ryan
In reply to this post by Art Kendall
Art's response leads me to believe I misunderstood the problem. Anyway sound advice has been given.

Ryan

Sent from my iPhone

On Jan 11, 2014, at 8:48 AM, Art Kendall <[hidden email]> wrote:

That approach throws away a lot of cases.  It also is an extreme coarsening of the data.

Do you have at least several hundred cases?

I suggest that you first  use
RANK with /ntiles=10.
then scatterplot the raw and coarsened DVs vs the IV.
try fitting  linear and loess curves in the graph editor in the output window. What does the difference in the two fits suggest to you?

try a set of 10 scatterplots or the raw DV and fit them with linear regressions.

if it looks fruitful, coarsen the dv to quintiles  via RANK and try quantile regression.
https://www.ibm.com/developerworks/community/files/app?lang=en#/file/bdd6814d-0386-4626-8efb-cab328c65066

Art Kendall
Social Research Consultants
On 1/10/2014 11:09 AM, Peter Spangler [via SPSSX Discussion] wrote:
I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor.
The DV is coded for yes/no to top decile membership for variable
annual_spend. The IV is initial_purchase. Annual_spend and
initial_purchase have a spearman correlation of .66, however,
initial_purchase is a negative b = -.05 in the logistic regression
output. I understand that a correlation is a follow up test to a
logistic regression. What could be occurring that positively
correlated variables could show a negative relationship when
predicting the top decile of one of them?

My data:

DV IV
0    10
1      25
0     5
1     18
1      40

Sent from my iPhone

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Correlation-and-logistic-regression-tp5723871.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants


View this message in context: Re: Correlation and logistic regression
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Art Kendall
In reply to this post by Peter Spangler
please use the list so that the conversation can benefit other list members and people who search the archives.

Thanks, Art, I will begin with the scatterplots and see about fitting the groups at quintiles

I definitely agree, this coarsening of the data was under the direction of a branding consultant. I typically begin exploring potential groups with clustering however this was not the case.

Yes, I am working with a dataset of 500 cases. These 500 cases are a sample of a primary dataset of 13.5 million cases.

especially  when doing exploratory work, I think you would want to start with a much larger sample.
using a sample of say 500 cases is okay to use in drafting your syntax.
After the syntax is ready to go, you might want to draw a few more samples and see if the scatter plots look very different.



if you end up with quintiles and if you have the few variables on your local machine, it should only take a few minutes to run MEANS and see if the sample statistics look very different from those on the whole data set.

If you find you want to run quantile regression, R is more limited as to the number of cases it can deal with, but IIRC you should be able to use much larger samples to develop a model.




Art Kendall
Social Research Consultants
On 1/10/2014 11:09 AM, Peter Spangler [via SPSSX Discussion] wrote:
I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor.
The DV is coded for yes/no to top decile membership for variable
annual_spend. The IV is initial_purchase. Annual_spend and
initial_purchase have a spearman correlation of .66, however,
initial_purchase is a negative b = -.05 in the logistic regression
output. I understand that a correlation is a follow up test to a
logistic regression. What could be occurring that positively
correlated variables could show a negative relationship when
predicting the top decile of one of them?

My data:

DV IV
0    10
1      25
0     5
1     18
1      40

Sent from my iPhone

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Correlation-and-logistic-regression-tp5723871.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Correlation and logistic regression

Rich Ulrich
In reply to this post by Peter Spangler
There are better examples than your data, for
*learning* to think about suppressor variables in
regression.

One key to remember is that the coefficient in the
regression is a *partial* regression coefficient, and
it does not have to be the same size or sign as the
raw correlation (or regression).

Example:  Low blood pressure in either leg might
indicate a blood clot; but not very reliably.  But
the magnitude of the difference between left and right
leg -- the difference score, if you will -- is a potent
predictor.

Famous example from the past:  Reading speed as
measured in an achievement test has a negative coefficient
in measuring "reading comprehension", in order to get
a score that is independent of and separate from the
reading speed.

In your data, I take it that the Annual (spending) is a
perfect predictor of Top_decile (annual spending) because
it is a transformation of it.  The question, or problem, is,
What do you get for a residual when you predict, using
that?  How well is Initial correlated with the residual?
- As your "partial regression coefficient" shows, it has
a (slight?) negative correlation.

Try other discussions online if this is still confusing.

--
Rich Ulrich


________________________________

> Date: Sat, 11 Jan 2014 10:07:27 -0800
> From: [hidden email]
> Subject: Re: Correlation and logistic regression
> To: [hidden email]
>
> The supressor effect is something I just discovered last night but I'm
> not sure I understand it: Since I am predicting a effect of the
> difference between "initial" and "subsequent" the beta is a negative
> indicator because the difference is negative? Though the predictor
> "initial" is negative, it does correctly predict  33% of the Top Decile
> group.
>
>
> On Sat, Jan 11, 2014 at 9:23 AM, Rich Ulrich
> <[hidden email]<mailto:[hidden email]>> wrote:
> You are predicting from two highly correlated variables,
> where the criterion is "the top decile of one of them."
>
> Well, any time you are predicting from two highly
> correlated variables, you have a good risk that one of
> them will behave as a "suppressor variable" - which you
> can read about.
>
> You might have the simple version here: What would be
> two predictors that are both positive would be "initial"
> and "subsequent" spending, where "subsequent" is the
> difference between Annual and Initial.  Your equation is
> generating some estimate of the importance of Subsequent
> by using a negative coefficient to imply the difference.
>
> I do agree that when looking at multiple predictors, it is
> always advisable to consider at the correlations among predictors,
> in addition to looking at the univariate predictions, before
> drawing conclusions about the joint prediction. The poster who
> suggested otherwise sounds naive to me.
>
> --
> Rich Ulrich
>
> ----------------------------------------
> > Date: Fri, 10 Jan 2014 08:00:27 -0800
> > From: [hidden email]<mailto:[hidden email]>
> > Subject: Correlation and logistic regression
> > To: [hidden email]<mailto:[hidden email]>
> >
> > I am trying to understand a binary logistic regression output where my
> > scale variable is a negative predictor.
> > The DV is coded for yes/no to top decile membership for variable
> > annual_spend. The IV is initial_purchase. Annual_spend and
> > initial_purchase have a spearman correlation of .66, however,
> > initial_purchase is a negative b = -.05 in the logistic regression
> > output. I understand that a correlation is a follow up test to a
> > logistic regression. What could be occurring that positively
> > correlated variables could show a negative relationship when
> > predicting the top decile of one of them?
> >
> > My data:
> >
> > DV IV
> > 0 10
> > 1 25
> > 0 5
> > 1 18
> > 1 40
> ...
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD