SPSSX Discussion - Re: OT what kind of regressions, etc w log-normal variables

Re: OT what kind of regressions, etc w log-normal variables

Posted by Art Kendall on Oct 31, 2013; 1:17pm
URL: http://spssx-discussion.165.s1.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722824.html

Thank you. That rule of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures is expected to be log normal. Most people will be normal in the sense of usual.

If I recall correctly it was the classic Cohen and Cohen book that called checking for a curvilinear relation using X and X**2 an interaction of a variable with itself. It is in the part where they talk about multiplying IVs by each other. The relation x and y depends on what the value of x. In Psych the classic example is anxiety and performance. low anxiety low performance ("Who cares"); medium anxiety high performance; high anxiety low performance

Especially in experiment an interaction is what one would desire. Consider the simplest case. Many years ago it became more broadly known that anova was a specially case of regression which could be considered a special case of canonical correlation. A couple decades ago the term general linear model became widely known.

Think about a the simple 2 by 2R (1 between subjects factor and 1 within subjects factor). The between factor is assigned to treatment or not. The within factor is time: pre and post. One hopes to find a difference in change aka a difference of difference. When analyzed by regression time one hopes that the second hierarchical step (this is not the nefarious stepwise approach) will be significant and meaningful
/enter time group
/enter time*/group

Art Kendall
Social Research Consultants

On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion] wrote:

Maybe you need to "lead" a little more?

I don't start out worrying about what is normal or log-normal.
However, I do keep in mind the crude rules of thumb offered
by John Tukey in his text book ("Exploratory Data Analysis", I
think) concerning the range of a variable. "If the largest value
of a natural set of scores is 10 times the size of the smallest,
you should consider a transformation; if it is 20 time, you should
probably take the log." That's from memory, and he probably said
it better.

So, log-normal is important when it actually affects the scaling.
Taking the log won't do much when the range is relatively small,
even though the shape may be "log-normal."

And his rule-of-thumb is pretty relevant for most measurements in
the social sciences whenever there is non-zero, non-negative data
with a natural zero which is not going to be observed. Reciprocals,
square-roots, etc., are other possible transformations that are natural
for the circumstances that generate various sorts of data. "How the
numbers are generated" is at least as important in justifying a particular
transformation as the resulting shape of the curve before or after
transforming.

Still, "normality" is less important than (a) observing a linear relationship
in a model and (b) observing equal variance in the errors across the
range of the model. In the social sciences, it is very common that
a univariate distribution that is observed to be log-normal is also
going to be modeled most ideally by taking its logs -- especially when
the scores cover a large range. That's a convenient coincidence, but
certainly is not magic or reliable.

I've very seldom included a term for X^2 in a model, and I don't remember
ever thinking of it as "the interaction of a variable with itself."

About interactions in general -- I like the insight that someone else posted
in a stats-group a dozen years ago ... that an interaction is a sign that
you have the wrong model, either in scaling or in the choice or definition
of variables.

--
Rich Ulrich
Date: Wed, 30 Oct 2013 06:12:30 -0700
From: [hidden email]
Subject: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

I am asking this because some of us have a disagreement. I am trying to ask without these being leading questions.

If there a set of of raw variables some are log-normally distributed some roughly normal.

Both the roughly normal and the log-normal variables could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2).
...
If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants