My dependent variable is, by origin, absolute residual* left after some
regression; it is distributed half-normally. I never regressed such distributed DV. I consider to use Generalized linear model (GENLIN) to regress it on some predictors (totally different from those which produced the residuals). What distribution type should I use for the DV? For continuous data, GENLIN offers Gamma, Inverse Gaussian and Tweedie distributions. Which to choose to model the half-normal? Or should I apply special transforms before? And what link function would be most appropriate? What can you advice on that? Thanks. *More precisely, I'm analysing positive residuals separately and negative residuals separately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I do not have a direct answer but have a gut
reaction.
You talk about analyzing subgroups separately. Why not include a dichotomous predictor and use it and interaction terms involving it in your new analysis? Art Kendall Social Research Consultants On 11/13/2011 4:30 AM, KO wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDMy dependent variable is, by origin, absolute residual* left after some regression; it is distributed half-normally. I never regressed such distributed DV. I consider to use Generalized linear model (GENLIN) to regress it on some predictors (totally different from those which produced the residuals). What distribution type should I use for the DV? For continuous data, GENLIN offers Gamma, Inverse Gaussian and Tweedie distributions. Which to choose to model the half-normal? Or should I apply special transforms before? And what link function would be most appropriate? What can you advice on that? Thanks. *More precisely, I'm analysing positive residuals separately and negative residuals separately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Kirill Orlov
It seems to me that if there is going to be much to be found, by
reducing of large residuals, you must have some large outliers to start with. What else motivates this proposed analysis? - In there are outliers, "half-normal" will be an understatement concerning the length of the tail. - I would probably start with a simple, direct examination of outliers. But you don't mention either the N or the R^2 achieved by the original analysis, so it is hard to imagine what you are working with. -- Rich Ulrich > Date: Sun, 13 Nov 2011 04:30:01 -0500 > From: [hidden email] > Subject: Half-normal distributed DV in generalized linear model? > To: [hidden email] > > My dependent variable is, by origin, absolute residual* left after some > regression; it is distributed half-normally. I never regressed such > distributed DV. I consider to use Generalized linear model (GENLIN) to > regress it on some predictors (totally different from those which produced > the residuals). What distribution type should I use for the DV? For > continuous data, GENLIN offers Gamma, Inverse Gaussian and Tweedie > distributions. Which to choose to model the half-normal? Or should I apply > special transforms before? And what link function would be most > appropriate? What can you advice on that? Thanks. > > *More precisely, I'm analysing positive residuals separately and negative > residuals separately. > |
In reply to this post by Kirill Orlov
You stated:
"...My dependent variable is, by origin, absolute residual* left after some regression..." First, please define "some regression." Second, please let us know what your research question(s) is/are with respect to the first regression and, the regression on the residuals.Third, please provide more information about your research study, in general.
Ryan
On Sun, Nov 13, 2011 at 4:30 AM, KO <[hidden email]> wrote: My dependent variable is, by origin, absolute residual* left after some |
In reply to this post by Kirill Orlov
I will will bate your curiosity about my study, as long as you ask.
Children's body weight was regressed on theit height in nonlinear regression minimizing absolute deviations (that is, approximating conditional median). The body weight was preliminary power-transformed to conquer heteroscedasticity, so the residuals are quite homoscedastic and can be directly compared between children of different heights. Now I want to explore how the residual (i.e. deviation from normal=median weight) is "dependent" on a set of various chronical disease diagnoses, these IVs being binary (diagnosis present/absent). Of course, positive deviations and negative deviations of body weight are to be regressed separately (consider, for example, "endocrine disease" diagnosis; it positively correlates with positive deviations, due to cases of obesity, and it hardly correlates with negative deviations). So, my task is streightforward multiple regression task. However the DV in my case is radically skewed, because it is one half of almost normally distributed variable: it is half-normal, it has only right tail and cut in place of the left tail. No nonparametric ranking techniques will do: I don't want any uniform distribution, I cherish the original distribution. I feel like using GENLIN with Gamma type distribution for my half-normal distribution, but I'm not sure. |
Thanks for the detail. It sounds like you have taken care with your
first regression, and your residuals (your new DV) should be well-behaved -- in the sense that those scores should have the equal-interval property. That is the main concern for doing OLS regression. There is no requirement for OLS regression that the DV should be normal; I remind you that the only technical requirements are on the residual. The more-or-less ideal case, for easily meeting requirements, is that both the DV and the IVs are normal. It sounds to me as if your new IVs are mainly dichotomies ("varioius ... diagnoses") which may be rare, so that your prediction equation is not apt to result in a normal-shaped set of scores, either. The purpose of those link functions, as I see it, is to account for some underlying, unequal intervals expected when those scores were created by particular generating processes, thus resulting in a particular, natural shape of a distribution... which has unequal intervals and errors. You do not have that circumstance. You have an odd-shaped DV, but you still have equal intervals. The requirement for proper interpretation of the resulting tests in regression is that the new residuals should something like normal, and it looks to me like you are most likely to get that without any unusual link function. - I consider my comments above to be a generally well-informed opinion, but not an irrefutable or unalterable one. - I'm not an expert on link functions, so I would not mind hearing some confirmation if someone thinks I'm right. -- Rich Ulrich > Date: Sun, 13 Nov 2011 22:41:24 -0800 > From: [hidden email] > Subject: Re: Half-normal distributed DV in generalized linear model? > To: [hidden email] > > I will will bate your curiosity about my study, as long as you ask. > > Children's body weight was regressed on theit height in nonlinear regression > minimizing absolute deviations (that is, approximating conditional median). > The body weight was preliminary power-transformed to conquer > heteroscedasticity, so the residuals are quite homoscedastic and can be > directly compared between children of different heights. > > Now I want to explore how the residual (i.e. deviation from normal=median > weight) is "dependent" on a set of various chronical disease diagnoses, > these IVs being binary (diagnosis present/absent). Of course, positive > deviations and negative deviations of body weight are to be regressed > separately (consider, for example, "endocrine disease" diagnosis; it > positively correlates with positive deviations, due to cases of obesity, and > it hardly correlates with negative deviations). > > So, my task is streightforward multiple regression task. However the DV in > my case is radically skewed, because it is one half of almost normally > distributed variable: it is half-normal, it has only right tail and cut in > place of the left tail. No nonparametric ranking techniques will do: I don't > want any uniform distribution, I cherish the original distribution. I feel > like using GENLIN with Gamma type distribution for my half-normal > distribution, but I'm not sure. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Half-normal-distributed-DV-in-generalized-linear-model-tp4988259p4989902.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
Free forum by Nabble | Edit this page |