If a random variable has a U-shape, what regression model could be computed in order to get the optimal expected value E(X) ?
Dr. Frank Gaeth
|
El 23/04/2012 17:59, drfg2008 escribió:
> If a random variable has a U-shape, what regression model could be computed > in order to get the optimal expected value E(X) ? Quadratic: y=a+b·x+c·x^2. U & J shaped relationships can be fitted used polynomial regression. You may either center x & x^2 or not (there's people in favor and against... I'm not going to choose a side, not today - too tired for that) HTH, Marta GG > > ----- > Dr. Frank Gaeth > FU-Berlin > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/U-shape-regression-tp5659756p5659756.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by drfg2008
Questions to you Frank:
From your grade school math notes!!! What sort of function yields a U shape? How would one transform such to linear?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Marta Garcia-Granero
I don't believe the OP indicated a U-shaped relationship between two variables. At any rate, before considering candidate models, we need to know more about the variable. What does it represent? What values can it take on? Is it censored or truncated at the lower and/or upper end?
Please elaborate. Ryan On Apr 23, 2012, at 12:05 PM, Marta García-Granero <[hidden email]> wrote: > El 23/04/2012 17:59, drfg2008 escribió: >> If a random variable has a U-shape, what regression model could be computed >> in order to get the optimal expected value E(X) ? > > Quadratic: y=a+b·x+c·x^2. > > U & J shaped relationships can be fitted used polynomial regression. > > You may either center x & x^2 or not (there's people in favor and > against... I'm not going to choose a side, not today - too tired for that) > > HTH, > Marta GG >> >> ----- >> Dr. Frank Gaeth >> FU-Berlin >> >> -- >> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/U-shape-regression-tp5659756p5659756.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by drfg2008
If a variable, all by itself and not in a relationship, is
said to "have a U-shape", I assume that you must mean that the density is concentrated at the extremes. For instance, the distribution of proportions is often described that way, and it leads to models based on logit or probit or some other "folded" distribution -- where the proper model does depend on what generates the data. The model determines what is technically going to "linearize" the effects of something; and, coincidentally, creating a model where variances are homogeneous across the range. Thus, it *can* be wiser (and legitimate) to look at the average of logits of P rather than the average of P. But your question is confusing. "Regression" *really* doesn't enter in, when it comes to Expected Value. And E(logit(X)) is not the same as E(X) . I guess that I think of computing it because it is "optimum" in some sense. Unless I'm missing something, you probably need to clarify what you are asking about, if this doesn't tell you what you want. Or if someone else's Internet ESP hasn't hit on it. -- Rich Ulrich > Date: Mon, 23 Apr 2012 08:59:15 -0700 > From: [hidden email] > Subject: U-shape regression > To: [hidden email] > > If a random variable has a U-shape, what regression model could be computed > in order to get the optimal expected value E(X) ? > > ----- > Dr. Frank Gaeth ... |
yes, the description is confusing and somewhat misleading, sorry for that.
What I meant was, that the distribution function of a discrete variable has a 'U-shape'. If you have a huge number of dummy variables as independent variables (0-1): What is the best or the appropriate estimation method to estimate E(x) for each case (some 2.500 cases). (If it was a Poisson distribution, for example, I would use a Poisson regression. That's why I used the word 'regression'). (Hope this is not even more confusing) Frank
Dr. Frank Gaeth
|
Okay, I still am not completely sure about your terminology.
You are using "E(x) for each case", I think, as a way to say that you want a predicted value for each case, using a regression of some kind. (I tend to fixate on E(x) as some reference to a rather simple mean of x, rather than a predicted value from a regression.) I think I've said this before about Poisson-appearing data, and i will say it again -- The shape of the distribution in the data is not what is critical about determining or deciding on an "appropriate" transformation or model. For OLS models, the important, related factor is that the residuals be rather normally distributed. The best clue for the underlying distribution, in my own experience, has been the information about how the numbers are generated. What we learn that is important for that, especially if it does not provide a definitive answer, is the information (or expert intuition) about what constitutes an "equal interval" for differences; and that usually is an indication of the expected "error variance" for different scores. Or vice-versa, the error variance clues us to what should be considered equal intervals. Are your data fixed in a range from 0 to 1, like probabilities? Or can they be standardized, logically, so that (0,1) is the range? Does this make sense, to think of them as something like probabilities? - If so, the logistic regression is the obvious candidate. Because logistic regressions are so popular and familiar, and because OLS linear regression would have the same model-inappropriateness as it does for range-limited data, LR is your obvious candidate. But does it make sense? -- Rich Ulrich > Date: Mon, 23 Apr 2012 12:10:06 -0700 > From: [hidden email] > Subject: Re: U-shape regression > To: [hidden email] > > yes, the description is confusing and somewhat misleading, sorry for that. > > What I meant was, that the distribution function of a discrete variable has > a 'U-shape'. > > If you have a huge number of dummy variables as independent variables (0-1): > What is the best or the appropriate estimation method to estimate E(x) for > each case (some 2.500 cases). > (If it was a Poisson distribution, for example, I would use a Poisson > regression. That's why I used the word 'regression'). > > (Hope this is not even more confusing) > >... |
For OLS models, the important, related factor is that the residuals be rather normally distributed.
That seems to be the case. The Normal Q-Q-Plot of unstandardized residuals shows not much of a deviation from the line, however the KSO-Test shows a sig. result p<0,01 (most extreme differences are absolute 0,114 / KSO-Z: 5,37 / N=2.227). So I guess, a linear regression will be ok. Thanks (yes, the predicted value for each case was meant)
Dr. Frank Gaeth
|
In reply to this post by Rich Ulrich
While there is no disputing Rich's message below, I still think there are missing details which could play a vital role in determining the appropriate model. If you have count data (e.g., number of days missed school in the past 30 days) which appear to have a U shaped distribution, then OLS regression may not be optimal. In fact, a standard binomial logistic regression (# of events / total # of trials) may not be optimal either. Since the OP has not provided us with exactly what this variable represents, whether the range is restricted in some way and why (e.g., censoring, truncation) etc., it simply is not possible [for me] to recommend a set of candidate models to consider.
Ryan On Tue, Apr 24, 2012 at 10:07 PM, Rich Ulrich <[hidden email]> wrote:
|
This post was updated on .
The variable has four attributes (1-4) and is equidistant ordinal. And U-shaped. The predictor variables are dichotomous. However, it is not about belonging to one of four categories, but (if possible) a predicted mean value (continuous) should be computed for each case. Even if that contradicts (to generate a cardinal from ordinal data value). This value is needed for use in a further process.
Dr. Frank Gaeth
|
Administrator
|
I don't believe you have really addressed Ryan's questions ;-)
Hmm, For example, Are negative predicted values permissible? -------------------------------------------------------------------------- "equidistant ordinal"???? WTF are you smoking? We used to call that interval or ratio (true zero exists) but feel free to make up your own nomenclature. It is not the least bit confusing or distracting (** How do you know the intervals are equidistant**) ??? **What does this variable represent** ??why are your cards always so close to the chest?? It always takes about 10 rounds to get any useful information from you and then after you have answers from numerous people you have the tendency to go off without any closure. ***** BONUS POINTS ***** ***** You never bother to help anyone else when they have questions!!!!!*****...
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by drfg2008
There are many dialects in statistics.
However in most of them, the distinction between ordinal and interval level of measurement is that for ordinal variables the intervals between values of a variable are merely ordered and clearly unequal. e.g., magnitudes of earthquakes. Interval level variables have values that are not too discrepant from being equal. Some wags would say that only situation of perfectly equal intervals is the dichotomy, since there is only one interval and anything is equal to itself. It helps list members to have all of the pertinent information before diagnosing a situation. A variable has levels. Most stat dialects use the word attribute to mean a variable, not its levels. If my summary below is correct, and if there are not important considerations that are missing, perhaps this is what you want. You have an interval level variable with 4 levels. (The construct this operationalization represents is undefined.) You would like to know if it makes a difference to assume that the level of measurement is ordinal or interval. You have dichotomous predictors. (Their nature is undefined. Are they crossed, repeated measures, semantic sets, intrinsic dichotomies, cuts on continuous variables, etc.?) _You would like to find predicted values for each case._ Have you tried CATREG? It has two features that might help. It can be run assuming different levels of measure. I has a /save option so you can get new variables for each case. Art Kendall Social Research Consultants On 4/26/2012 1:53 AM, drfg2008 wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD/Since the OP has not provided us with exactly what this variable represents, whether the range is restricted in some way and why (e.g., censoring, truncation) etc., it simply is not possible [for me] to recommend a set of candidate models to consider. / The variable has four attributes (1-4) and is equidistant ordinal. And U-shaped. The predictor variables are dichotomous. However, it is not about belonging to one of four categories, but (if possible) a predicted mean value (continuous) should be computed for each case. Even if that contradicts (to generate a cardinal from ordinal data value). This value is needed for use in a further process. So I learned a linear regression is acceptable as long as the residuals are normally distributed. ----- Dr. Frank Gaeth FU-Berlin -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/U-shape-regression-tp5659756p5666768.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |