Hi,
I have a dependent variable with a scoring range of 0-40. 50% of the subject scored 0-5 with most of them actually scoring 0. I decided to dichotomize the outcome where the cut-off score is 5 or above. Due to what appears to be abnormal distribution I thought that I should use logistic regression rather than least squares method. What am I losing by using the logistic regression?
Thanks
Moshe
Assistant Professor Department of Physical Therapy Education College of Health Professions SUNY Upstate Medical University Room 2232 Silverman Hall 750 Adams Street Syracuse, NY 13210-1834 315 464 6577 FAX 315 464 6887 [hidden email] |
Dichotomizing necessarily involves losing information. Now in your case what you appear to have is a sort of Poisson distribution, where the most frequent event is zero, then rapidly decreasing numbers in the range 1-5, and even less in higher values. Thus you may want to use Poisson regression. On the other hand, if you must dichotomize, why not dichotomizing at “zero” and “1 or more”? Seems more reasonable to me, without knowing the actual content of your research. The value 5 does not seem to have any intrinsic characteristic to make it the critical value, especially because most of those below 5 are actually zero. Hector De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Moshe Marko Hi, I have a dependent variable with a scoring range of 0-40. 50% of the subject scored 0-5 with most of them actually scoring 0. I decided to dichotomize the outcome where the cut-off score is 5 or above. Due to what appears to be abnormal distribution I thought that I should use logistic regression rather than least squares method. What am I losing by using the logistic regression? Thanks Moshe Moshe Marko, PT, DPT, MHS, OCS, CSCS No virus found in this message. |
I agree with Hector, that zero is very often important, and it makes
sense to at least consider taking it alone, "none" versus "some". And also, that it is wasteful to dichotomize. However, "mostly-zero" with scores running to 40 is not a very likely Poisson. And that reminds me that sometimes there is a reasonable distribution for the rest, once zero is excluded. Does the density decrease as scores increase, or is there some other shape to what is left? Given a variable that is merely highly skewed, it is my tendency to look for a reasonable transformation that yields something close to equal-intervals in the latent quality being assessed. Is zero reasonable as a step below 1, or is there something special about zero? It could be better to use a second variable to describe non-linearity. In this case, the simple procedure might be this -- to do one analysis for none/ some and a second analysis that *excludes* the data with zero, and uses either the 1-40 score, or a transformation of it. -- Rich Ulrich Date: Fri, 23 Sep 2011 13:46:50 -0300 From: [hidden email] Subject: Re: What am I losing using a logistic regression To: [hidden email] Dichotomizing necessarily involves losing information. Now in your case what you appear to have is a sort of Poisson distribution, where the most frequent event is zero, then rapidly decreasing numbers in the range 1-5, and even less in higher values. Thus you may want to use Poisson regression. On the other hand, if you must dichotomize, why not dichotomizing at “zero” and “1 or more”? Seems more reasonable to me, without knowing the actual content of your research. The value 5 does not seem to have any intrinsic characteristic to make it the critical value, especially because most of those below 5 are actually zero.
Hector [snip, previous] |
How about a negative binomial distribution or perhaps a zero inflated negative binomial if the number of zero responses is too large? Dr. Paul R. Swank, Children's Learning Institute Professor, Department of Pediatrics, Medical School Adjunct Professor, School of Public Health University of Texas Health Science Center-Houston From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rich Ulrich I agree with Hector, that zero is very often important, and it makes Date: Fri, 23 Sep 2011 13:46:50 -0300 Dichotomizing necessarily involves losing information. Now in your case what you appear to have is a sort of Poisson distribution, where the most frequent event is zero, then rapidly decreasing numbers in the range 1-5, and even less in higher values. Thus you may want to use Poisson regression. On the other hand, if you must dichotomize, why not dichotomizing at “zero” and “1 or more”? Seems more reasonable to me, without knowing the actual content of your research. The value 5 does not seem to have any intrinsic characteristic to make it the critical value, especially because most of those below 5 are actually zero. Hector [snip, previous] |
In reply to this post by Moshe Marko
Moshe,
Please provide a more accurate distribution of your data--perhaps you should just provide a frequency distribution table. Also, please tell us what this variable represents. Ryan On Fri, Sep 23, 2011 at 11:56 AM, Moshe Marko <[hidden email]> wrote: > Hi, > > I have a dependent variable with a scoring range of 0-40. 50% of the subject > scored 0-5 with most of them actually scoring 0. I decided to dichotomize > the outcome where the cut-off score is 5 or above. Due to what appears to be > abnormal distribution I thought that I should use logistic regression rather > than least squares method. What am I losing by using the logistic > regression? > > Thanks > > Moshe > > > Moshe Marko, PT, DPT, MHS, OCS, CSCS > Assistant Professor > Department of Physical Therapy Education > College of Health Professions > SUNY Upstate Medical University > Room 2232 Silverman Hall > 750 Adams Street > Syracuse, NY 13210-1834 > 315 464 6577 > FAX 315 464 6887 > [hidden email] > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Moshe Marko
Do not simply dichotomize your data. You have several options which
are partly dependent on the distribution (some of which have already been mentioned). What is the range? What is the shape across the entire range? Are those minimum and maximum values absolute limits, in that no matter what (even with a new sample), those limits could never be crossed? In the same vein, please provide a detailed explanation as to what these scores actually represent. Bottom line--more information would be helpful. Ryan On Fri, Sep 23, 2011 at 11:56 AM, Moshe Marko <[hidden email]> wrote: > Hi, > > I have a dependent variable with a scoring range of 0-40. 50% of the subject > scored 0-5 with most of them actually scoring 0. I decided to dichotomize > the outcome where the cut-off score is 5 or above. Due to what appears to be > abnormal distribution I thought that I should use logistic regression rather > than least squares method. What am I losing by using the logistic > regression? > > Thanks > > Moshe > > > Moshe Marko, PT, DPT, MHS, OCS, CSCS > Assistant Professor > Department of Physical Therapy Education > College of Health Professions > SUNY Upstate Medical University > Room 2232 Silverman Hall > 750 Adams Street > Syracuse, NY 13210-1834 > 315 464 6577 > FAX 315 464 6887 > [hidden email] > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |