SPSSX Discussion

U-shape regression

Classic

List

Threaded

12 messages Options

drfg2008

U-shape regression

If a random variable has a U-shape, what regression model could be computed in order to get the optimal expected value E(X) ?

Dr. Frank Gaeth

Marta Garcia-Granero

Re: U-shape regression

El 23/04/2012 17:59, drfg2008 escribió:
> If a random variable has a U-shape, what regression model could be computed
> in order to get the optimal expected value E(X) ?

Quadratic: y=a+b·x+c·x^2.

U & J shaped relationships can be fitted used polynomial regression.

You may either center x & x^2 or not (there's people in favor and
against... I'm not going to choose a side, not today - too tired for that)

HTH,
Marta GG

>
> -----
> Dr. Frank Gaeth
> FU-Berlin
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/U-shape-regression-tp5659756p5659756.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: U-shape regression

Administrator

In reply to this post by drfg2008

Questions to you Frank:
From your grade school math notes!!!
What sort of function yields a U shape?
How would one transform such to linear?

drfg2008 wrote

If a random variable has a U-shape, what regression model could be computed in order to get the optimal expected value E(X) ?

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Ryan

Re: U-shape regression

In reply to this post by Marta Garcia-Granero

I don't believe the OP indicated a U-shaped relationship between two variables. At any rate, before considering candidate models, we need to know more about the variable. What does it represent? What values can it take on? Is it censored or truncated at the lower and/or upper end?

Please elaborate.

Ryan

On Apr 23, 2012, at 12:05 PM, Marta García-Granero <[hidden email]> wrote:

> El 23/04/2012 17:59, drfg2008 escribió:
>> If a random variable has a U-shape, what regression model could be computed
>> in order to get the optimal expected value E(X) ?
>
> Quadratic: y=a+b·x+c·x^2.
>
> U & J shaped relationships can be fitted used polynomial regression.
>
> You may either center x & x^2 or not (there's people in favor and
> against... I'm not going to choose a side, not today - too tired for that)
>
> HTH,
> Marta GG
>>
>> -----
>> Dr. Frank Gaeth
>> FU-Berlin
>>
>> --
>> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/U-shape-regression-tp5659756p5659756.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Rich Ulrich

Re: U-shape regression

In reply to this post by drfg2008

If a variable, all by itself and not in a relationship, is
said to "have a U-shape", I assume that you must mean
that the density is concentrated at the extremes.

For instance, the distribution of proportions is often
described that way, and it leads to models based on
logit or probit or some other "folded" distribution -- where
the proper model does depend on what generates the data.
The model determines what is technically going to "linearize"
the effects of something; and, coincidentally, creating a
model where variances are homogeneous across the range.

Thus, it *can* be wiser (and legitimate) to look at the
average of logits of P rather than the average of P.
But your question is confusing. "Regression" *really* doesn't
enter in, when it comes to Expected Value. And E(logit(X))
is not the same as E(X) . I guess that I think of computing
it because it is "optimum" in some sense.

Unless I'm missing something, you probably need to clarify
what you are asking about, if this doesn't tell you what you want.
Or if someone else's Internet ESP hasn't hit on it.

--
Rich Ulrich

> Date: Mon, 23 Apr 2012 08:59:15 -0700
> From: [hidden email]
> Subject: U-shape regression
> To: [hidden email]
>
> If a random variable has a U-shape, what regression model could be computed
> in order to get the optimal expected value E(X) ?
>
> -----
> Dr. Frank Gaeth
...

drfg2008

Re: U-shape regression

yes, the description is confusing and somewhat misleading, sorry for that.

What I meant was, that the distribution function of a discrete variable has a 'U-shape'.

If you have a huge number of dummy variables as independent variables (0-1): What is the best or the appropriate estimation method to estimate E(x) for each case (some 2.500 cases).
(If it was a Poisson distribution, for example, I would use a Poisson regression. That's why I used the word 'regression').

(Hope this is not even more confusing)

Frank

Dr. Frank Gaeth

Rich Ulrich

Re: U-shape regression

Okay, I still am not completely sure about your terminology.

You are using "E(x) for each case", I think, as a way to say
that you want a predicted value for each case, using a
regression of some kind. (I tend to fixate on E(x) as some
reference to a rather simple mean of x, rather than a
predicted value from a regression.)

I think I've said this before about Poisson-appearing data,
and i will say it again -- The shape of the distribution in the
data is not what is critical about determining or deciding on
an "appropriate" transformation or model. For OLS models,
the important, related factor is that the residuals be rather
normally distributed.

The best clue for the underlying distribution, in my own
experience, has been the information about how the numbers
are generated. What we learn that is important for that,
especially if it does not provide a definitive answer, is the
information (or expert intuition) about what constitutes
an "equal interval" for differences; and that usually is an
indication of the expected "error variance" for different
scores. Or vice-versa, the error variance clues us to what
should be considered equal intervals.

Are your data fixed in a range from 0 to 1, like probabilities?
Or can they be standardized, logically, so that (0,1) is the
range? Does this make sense, to think of them as something
like probabilities? - If so, the logistic regression is the obvious
candidate. Because logistic regressions are so popular and
familiar, and because OLS linear regression would have the
same model-inappropriateness as it does for range-limited
data, LR is your obvious candidate. But does it make sense?

--
Rich Ulrich

> Date: Mon, 23 Apr 2012 12:10:06 -0700

> From: [hidden email]
> Subject: Re: U-shape regression
> To: [hidden email]
>
> yes, the description is confusing and somewhat misleading, sorry for that.
>
> What I meant was, that the distribution function of a discrete variable has
> a 'U-shape'.
>
> If you have a huge number of dummy variables as independent variables (0-1):
> What is the best or the appropriate estimation method to estimate E(x) for
> each case (some 2.500 cases).
> (If it was a Poisson distribution, for example, I would use a Poisson
> regression. That's why I used the word 'regression').
>
> (Hope this is not even more confusing)
>
>...

drfg2008

Re: U-shape regression

For OLS models, the important, related factor is that the residuals be rather normally distributed.

That seems to be the case. The Normal Q-Q-Plot of unstandardized residuals shows not much of a deviation from the line, however the KSO-Test shows a sig. result p<0,01 (most extreme differences are absolute 0,114 / KSO-Z: 5,37 / N=2.227).

So I guess, a linear regression will be ok.

Thanks

(yes, the predicted value for each case was meant)

Dr. Frank Gaeth

Ryan

Re: U-shape regression

In reply to this post by Rich Ulrich

While there is no disputing Rich's message below, I still think there are missing details which could play a vital role in determining the appropriate model. If you have count data (e.g., number of days missed school in the past 30 days) which appear to have a U shaped distribution, then OLS regression may not be optimal. In fact, a standard binomial logistic regression (# of events / total # of trials) may not be optimal either. Since the OP has not provided us with exactly what this variable represents, whether the range is restricted in some way and why (e.g., censoring, truncation) etc., it simply is not possible [for me] to recommend a set of candidate models to consider.

Ryan

On Tue, Apr 24, 2012 at 10:07 PM, Rich Ulrich <[hidden email]> wrote:

Okay, I still am not completely sure about your terminology.

You are using "E(x) for each case", I think, as a way to say
that you want a predicted value for each case, using a
regression of some kind. (I tend to fixate on E(x) as some
reference to a rather simple mean of x, rather than a
predicted value from a regression.)

I think I've said this before about Poisson-appearing data,
and i will say it again -- The shape of the distribution in the
data is not what is critical about determining or deciding on
an "appropriate" transformation or model. For OLS models,
the important, related factor is that the residuals be rather
normally distributed.

The best clue for the underlying distribution, in my own
experience, has been the information about how the numbers
are generated. What we learn that is important for that,
especially if it does not provide a definitive answer, is the
information (or expert intuition) about what constitutes
an "equal interval" for differences; and that usually is an
indication of the expected "error variance" for different
scores. Or vice-versa, the error variance clues us to what
should be considered equal intervals.

Are your data fixed in a range from 0 to 1, like probabilities?
Or can they be standardized, logically, so that (0,1) is the
range? Does this make sense, to think of them as something
like probabilities? - If so, the logistic regression is the obvious
candidate. Because logistic regressions are so popular and
familiar, and because OLS linear regression would have the
same model-inappropriateness as it does for range-limited
data, LR is your obvious candidate. But does it make sense?

--
Rich Ulrich

> Date: Mon, 23 Apr 2012 12:10:06 -0700
> From: [hidden email]
> Subject: Re: U-shape regression
> To: [hidden email]

>
> yes, the description is confusing and somewhat misleading, sorry for that.
>
> What I meant was, that the distribution function of a discrete variable has
> a 'U-shape'.
>
> If you have a huge number of dummy variables as independent variables (0-1):
> What is the best or the appropriate estimation method to estimate E(x) for
> each case (some 2.500 cases).
> (If it was a Poisson distribution, for example, I would use a Poisson
> regression. That's why I used the word 'regression').
>
> (Hope this is not even more confusing)
>
>...

drfg2008

Re: U-shape regression

This post was updated on .

The variable has four attributes (1-4) and is equidistant ordinal. And U-shaped. The predictor variables are dichotomous. However, it is not about belonging to one of four categories, but (if possible) a predicted mean value (continuous) should be computed for each case. Even if that contradicts (to generate a cardinal from ordinal data value). This value is needed for use in a further process.

Dr. Frank Gaeth

David Marso

Re: U-shape regression

Administrator

I don't believe you have really addressed Ryan's questions ;-)
Hmm, For example, Are negative predicted values permissible?
--------------------------------------------------------------------------
"equidistant ordinal"???? WTF are you smoking?
We used to call that interval or ratio (true zero exists) but feel free to make up your own nomenclature. It is not the least bit confusing or distracting (** How do you know the intervals are equidistant**) ???
**What does this variable represent**
??why are your cards always so close to the chest??
It always takes about 10 rounds to get any useful information from you and then after you have answers from numerous people you have the tendency to go off without any closure.
***** BONUS POINTS *****
***** You never bother to help anyone else when they have questions!!!!!*****...

drfg2008 wrote

Since the OP has not provided us with exactly what this variable represents, whether the range is restricted in some way and why (e.g., censoring, truncation) etc., it simply is not possible [for me] to recommend a set of candidate models to consider.

The variable has four attributes (1-4) and is equidistant ordinal. And U-shaped. The predictor variables are dichotomous. However, it is not about belonging to one of four categories, but (if possible) a predicted mean value (continuous) should be computed for each case. Even if that contradicts (to generate a cardinal from ordinal data value). This value is needed for use in a further process.

So I learned a linear regression is acceptable as long as the residuals are normally distributed.

Art Kendall

Re: U-shape regression

In reply to this post by drfg2008

There are many dialects in statistics.
However in most of them, the distinction between ordinal and interval level of measurement is that for ordinal variables the intervals between values of a variable are merely ordered and clearly unequal. e.g., magnitudes of earthquakes. Interval level variables have values that are not too discrepant from being equal. Some wags would say that only situation of perfectly equal intervals is the dichotomy, since there is only one interval and anything is equal to itself.

It helps list members to have all of the pertinent information before diagnosing a situation.

A variable has levels. Most stat dialects use the word attribute to mean a variable, not its levels.

If my summary below is correct, and if there are not important considerations that are missing, perhaps this is what you want.

You have an interval level variable with 4 levels. (The construct this operationalization represents is undefined.)
You would like to know if it makes a difference to assume that the level of measurement is ordinal or interval.
You have dichotomous predictors. (Their nature is undefined. Are they crossed, repeated measures, semantic sets, intrinsic dichotomies, cuts on continuous variables, etc.?)
_You would like to find predicted values for each case._

Have you tried CATREG? It has two features that might help.
It can be run assuming different levels of measure.
I has a /save option so you can get new variables for each case.

Art Kendall Social Research Consultants
On 4/26/2012 1:53 AM, drfg2008 wrote:

/Since the OP has not provided us with exactly what this variable represents,
whether the range is restricted in some way and why (e.g., censoring,
truncation) etc., it simply is not possible [for me] to recommend a set of
candidate models to consider. /

The variable has four attributes (1-4) and is equidistant ordinal. And
U-shaped. The predictor variables are dichotomous. However, it is not about
belonging to one of four categories, but (if possible) a predicted mean
value (continuous) should be computed for each case. Even if that
contradicts (to generate a cardinal from ordinal data value). This value is
needed for use in a further process.

So I learned a linear regression is acceptable as long as the residuals are
normally distributed.



-----
Dr. Frank Gaeth
FU-Berlin

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/U-shape-regression-tp5659756p5666768.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants