I am trying to understand a binary logistic regression output where my
scale variable is a negative predictor. The DV is coded for yes/no to top decile membership for variable annual_spend. The IV is initial_purchase. Annual_spend and initial_purchase have a spearman correlation of .66, however, initial_purchase is a negative b = -.05 in the logistic regression output. I understand that a correlation is a follow up test to a logistic regression. What could be occurring that positively correlated variables could show a negative relationship when predicting the top decile of one of them? My data: DV IV 0 10 1 25 0 5 1 18 1 40 Sent from my iPhone ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Responses below
On Fri, Jan 10, 2014 at 11:00 AM, Peter Spangler <[hidden email]> wrote: I am trying to understand a binary logistic regression output where my I assume you are referring to a continuous predictor. The DV is coded for yes/no to top decile membership for variable I am generally not in favor of cutting up data without very good reason. The IV is initial_purchase. Annual_spend and I don't see that with the sample data you provided below. I understand that a correlation is a follow up test to a That is not routine for me. What could be occurring that positively Depending on how you cut the data could certainly affect the association between two variables. Why don't you provide the actual data in SPSS syntax form for us to examine?...
DATA LIST list /x1 x2. BEGIN DATA 0 10 1 25 0 5 1 18 1 40 END DATA. Ryan
|
In reply to this post by Peter Spangler
That approach throws
away a lot of cases. It also is an extreme coarsening of the
data.
Do you have at least several hundred cases? I suggest that you first use RANK with /ntiles=10. then scatterplot the raw and coarsened DVs vs the IV. try fitting linear and loess curves in the graph editor in the output window. What does the difference in the two fits suggest to you? try a set of 10 scatterplots or the raw DV and fit them with linear regressions. if it looks fruitful, coarsen the dv to quintiles via RANK and try quantile regression. https://www.ibm.com/developerworks/community/files/app?lang=en#/file/bdd6814d-0386-4626-8efb-cab328c65066 Art Kendall Social Research ConsultantsOn 1/10/2014 11:09 AM, Peter Spangler [via SPSSX Discussion] wrote: I am trying to understand a binary logistic regression output where my
Art Kendall
Social Research Consultants |
In reply to this post by Peter Spangler
You are predicting from two highly correlated variables,
where the criterion is "the top decile of one of them." Well, any time you are predicting from two highly correlated variables, you have a good risk that one of them will behave as a "suppressor variable" - which you can read about. You might have the simple version here: What would be two predictors that are both positive would be "initial" and "subsequent" spending, where "subsequent" is the difference between Annual and Initial. Your equation is generating some estimate of the importance of Subsequent by using a negative coefficient to imply the difference. I do agree that when looking at multiple predictors, it is always advisable to consider at the correlations among predictors, in addition to looking at the univariate predictions, before drawing conclusions about the joint prediction. The poster who suggested otherwise sounds naive to me. -- Rich Ulrich ---------------------------------------- > Date: Fri, 10 Jan 2014 08:00:27 -0800 > From: [hidden email] > Subject: Correlation and logistic regression > To: [hidden email] > > I am trying to understand a binary logistic regression output where my > scale variable is a negative predictor. > The DV is coded for yes/no to top decile membership for variable > annual_spend. The IV is initial_purchase. Annual_spend and > initial_purchase have a spearman correlation of .66, however, > initial_purchase is a negative b = -.05 in the logistic regression > output. I understand that a correlation is a follow up test to a > logistic regression. What could be occurring that positively > correlated variables could show a negative relationship when > predicting the top decile of one of them? > > My data: > > DV IV > 0 10 > 1 25 > 0 5 > 1 18 > 1 40 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Art Kendall
Thanks, Art, I will begin with the scatterplots and see about fitting the groups at quintiles Yes, I am working with a dataset of 500 cases. These 500 cases are a sample of a primary dataset of 13.5 million cases. On Sat, Jan 11, 2014 at 5:48 AM, Art Kendall <[hidden email]> wrote:
|
In reply to this post by Rich Ulrich
The supressor effect is something I just discovered last night but I'm not sure I understand it: Since I am predicting a effect of the difference between "initial" and "subsequent" the beta is a negative indicator because the difference is negative? Though the predictor "initial" is negative, it does correctly predict 33% of the Top Decile group.
On Sat, Jan 11, 2014 at 9:23 AM, Rich Ulrich <[hidden email]> wrote: You are predicting from two highly correlated variables, |
In reply to this post by Art Kendall
Art's response leads me to believe I misunderstood the problem. Anyway sound advice has been given. Ryan Sent from my iPhone
|
In reply to this post by Peter Spangler
please use the list so
that the conversation can benefit other list members and people
who search the archives.
especially when doing exploratory work, I think you would want to start with a much larger sample. using a sample of say 500 cases is okay to use in drafting your syntax. After the syntax is ready to go, you might want to draw a few more samples and see if the scatter plots look very different. if you end up with quintiles and if you have the few variables on your local machine, it should only take a few minutes to run MEANS and see if the sample statistics look very different from those on the whole data set. If you find you want to run quantile regression, R is more limited as to the number of cases it can deal with, but IIRC you should be able to use much larger samples to develop a model. Art Kendall Social Research ConsultantsOn 1/10/2014 11:09 AM, Peter Spangler [via SPSSX Discussion] wrote: I am trying to understand a binary logistic regression output where my
Art Kendall
Social Research Consultants |
In reply to this post by Peter Spangler
There are better examples than your data, for
*learning* to think about suppressor variables in regression. One key to remember is that the coefficient in the regression is a *partial* regression coefficient, and it does not have to be the same size or sign as the raw correlation (or regression). Example: Low blood pressure in either leg might indicate a blood clot; but not very reliably. But the magnitude of the difference between left and right leg -- the difference score, if you will -- is a potent predictor. Famous example from the past: Reading speed as measured in an achievement test has a negative coefficient in measuring "reading comprehension", in order to get a score that is independent of and separate from the reading speed. In your data, I take it that the Annual (spending) is a perfect predictor of Top_decile (annual spending) because it is a transformation of it. The question, or problem, is, What do you get for a residual when you predict, using that? How well is Initial correlated with the residual? - As your "partial regression coefficient" shows, it has a (slight?) negative correlation. Try other discussions online if this is still confusing. -- Rich Ulrich ________________________________ > Date: Sat, 11 Jan 2014 10:07:27 -0800 > From: [hidden email] > Subject: Re: Correlation and logistic regression > To: [hidden email] > > The supressor effect is something I just discovered last night but I'm > not sure I understand it: Since I am predicting a effect of the > difference between "initial" and "subsequent" the beta is a negative > indicator because the difference is negative? Though the predictor > "initial" is negative, it does correctly predict 33% of the Top Decile > group. > > > On Sat, Jan 11, 2014 at 9:23 AM, Rich Ulrich > <[hidden email]<mailto:[hidden email]>> wrote: > You are predicting from two highly correlated variables, > where the criterion is "the top decile of one of them." > > Well, any time you are predicting from two highly > correlated variables, you have a good risk that one of > them will behave as a "suppressor variable" - which you > can read about. > > You might have the simple version here: What would be > two predictors that are both positive would be "initial" > and "subsequent" spending, where "subsequent" is the > difference between Annual and Initial. Your equation is > generating some estimate of the importance of Subsequent > by using a negative coefficient to imply the difference. > > I do agree that when looking at multiple predictors, it is > always advisable to consider at the correlations among predictors, > in addition to looking at the univariate predictions, before > drawing conclusions about the joint prediction. The poster who > suggested otherwise sounds naive to me. > > -- > Rich Ulrich > > ---------------------------------------- > > Date: Fri, 10 Jan 2014 08:00:27 -0800 > > From: [hidden email]<mailto:[hidden email]> > > Subject: Correlation and logistic regression > > To: [hidden email]<mailto:[hidden email]> > > > > I am trying to understand a binary logistic regression output where my > > scale variable is a negative predictor. > > The DV is coded for yes/no to top decile membership for variable > > annual_spend. The IV is initial_purchase. Annual_spend and > > initial_purchase have a spearman correlation of .66, however, > > initial_purchase is a negative b = -.05 in the logistic regression > > output. I understand that a correlation is a follow up test to a > > logistic regression. What could be occurring that positively > > correlated variables could show a negative relationship when > > predicting the top decile of one of them? > > > > My data: > > > > DV IV > > 0 10 > > 1 25 > > 0 5 > > 1 18 > > 1 40 > ... > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |