SPSSX Discussion - Re: OT what kind of regressions, etc w log-normal variables

Re: OT what kind of regressions, etc w log-normal variables

Posted by Bruce Weaver on Nov 01, 2013; 12:59pm
URL: http://spssx-discussion.165.s1.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722845.html

Further to Art's point, interaction is often referred to as "effect modification" in epidemiology and as "moderation" in psychology--e.g., psych researchers may talk about running "moderated multiple regression", which simply means that there is (at least one) product term in the regression model.

Personally, I don't understand why folks in psychology saw the need to introduce a new term for interaction, given how strongly statistics education in psychology focuses on ANOVA models, including factorial models (with interaction terms included). In that context, psychologists have always used and continue to use the term "interaction". But it seems "interaction" was not good enough for psychologists running regression models: They needed their own term for it. I guess Lee Cronbach's Two Disciplines address is still relevant today. ;-)

Art Kendall wrote

One way of phrasing
an"interaction" is that {changes or differences or
relations or distributions} of a dependent (explained, criterion,
outcome, left hand) variable are different for different levels of
an independent (explaining, predictor, right hand) variable.

I think it is unfortunate that students are not routinely taught
that there are different dialects of statistics and that there are
different ways to express the same result. E.g. a correlations
dialect might say "adult height is correlated with gender"
whereas a design of experiments dialect might say "there is a
significant difference between the mean heights of men and women".

I also think it is unfortunate that when newer methods such as
log-linear became common proponents often were not aware of the
earlier usage. Two of the most extreme examples of being unaware
of alternate uses of words is the invention of whole new field of
independent machine learning in the 70's to to what is
conceptually and often mathematically identical to what was called
clustering since the 40's. The other is the use of "ontology" for
the types or clusters found in "independent machine learning".

It is certainly a consideration of whether to use a simple change
score or regressed (partial, residual) change score. It is also a
consideration whether to to express the regression approach to
ancova with or without including interactions of the "covariate"
with other variables on the right hand. I have even seen
vocabulary that treats anything that one wants to control for that
is not manipulated as a "covariate". For example, if the variable
one wants to control for is gender, why not make it a factor when
using anova vocabulary and include the interactions with race or
treatment?
Art Kendall
Social Research Consultants
On 11/1/2013 1:31 AM, Rich Ulrich [via SPSSX Discussion] wrote:

"Interaction" is also the term used for every
correlation observed
in log-linear models of a multi-way table, where the "main
effects"
describe that univariate frequencies are or are not equal.

For your example of an interaction, I will offer this comment.
Testing a change
across time sometimes is done better by looking at the main
effect, between groups,
for "change scores". - That puts you immediately in the
position of
asking the question, which is often relevant, of whether a
regressed-change
score would be more appropriate than the simple-change: perform
the one-way
ANCOVA instead? And once you raise the topic of change scores,
there is a whole
literature on the difficult cases.

The "interaction" example that I like is for dominant gene
types. The observed
phenotype shows up as an interaction. That is -- AA, Aa, and
aA all show the
dominant trait, whereas only aa does not. However, the more
meaningful
model is one with a different outcome: the protein or enzyme
production
encoded by the genes. That model shows the difference as a main
effect, where AA
has twice as much production as Aa or aA. "Dominant" or not is
thus a function
of what you get with half-production. For a so-called dominant
trait, half as much
protein shows up with the same effect as the fuller amount. I
think I was always
somewhat puzzled by Aa, etc., until I learned the main-effect
model.

--
Rich Ulrich

Date: Thu, 31 Oct 2013 07:52:02 -0700
From: [hidden email]
Subject: Re: OT what kind of regressions, etc w log-normal
variables
To: [hidden email]

Thank you.
That rule of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures
is expected to be log normal. Most people will be normal
in the sense of usual.

If I recall correctly it was the classic Cohen and Cohen
book that called checking for a curvilinear relation using
X and X**2 an interaction of a variable with itself. It is
in the part where they talk about multiplying IVs by each
other. The relation x and y depends on what the value of
x. In Psych the classic example is anxiety and
performance. low anxiety low performance ("Who cares");
medium anxiety high performance; high anxiety low
performance

Especially in experiment an interaction is what one would
desire. Consider the simplest case. Many years ago it
became more broadly known that anova was a specially case
of regression which could be considered a special case of
canonical correlation. A couple decades ago the term
general linear model became widely known.

Think about a the simple 2 by 2R (1 between subjects
factor and 1 within subjects factor). The between factor
is assigned to treatment or not. The within factor is
time: pre and post. One hopes to find a difference in
change aka a difference of difference. When analyzed by
regression time one hopes that the second hierarchical
step (this is not the nefarious stepwise approach) will be
significant and meaningful
/enter time group
/enter time*/group

Art Kendall
Social Research Consultants
On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion]
wrote:

Maybe you need to "lead" a little more?

I don't start out worrying about what is normal or
log-normal.
However, I do keep in mind the crude rules of thumb
offered
by John Tukey in his text book ("Exploratory Data
Analysis", I
think) concerning the range of a variable. "If the
largest value
of a natural set of scores is 10 times the size of the
smallest,
you should consider a transformation; if it is 20 time,
you should
probably take the log." That's from memory, and he
probably said
it better.

So, log-normal is important when it actually affects the
scaling.
Taking the log won't do much when the range is relatively
small,
even though the shape may be "log-normal."

And his rule-of-thumb is pretty relevant for most
measurements in
the social sciences whenever there is non-zero,
non-negative data
with a natural zero which is not going to be observed.
Reciprocals,
square-roots, etc., are other possible transformations
that are natural
for the circumstances that generate various sorts of
data. "How the
numbers are generated" is at least as important in
justifying a particular
transformation as the resulting shape of the curve before
or after
transforming.

Still, "normality" is less important than (a) observing a
linear relationship
in a model and (b) observing equal variance in the errors
across the
range of the model. In the social sciences, it is very
common that
a univariate distribution that is observed to be
log-normal is also
going to be modeled most ideally by taking its logs --
especially when
the scores cover a large range. That's a convenient
coincidence, but
certainly is not magic or reliable.

I've very seldom included a term for X^2 in a model, and I
don't remember
ever thinking of it as "the interaction of a variable with
itself."

About interactions in general -- I like the insight that
someone else posted
in a stats-group a dozen years ago ... that an interaction
is a sign that
you have the wrong model, either in scaling or in the
choice or definition
of variables.

--
Rich Ulrich

Date: Wed, 30 Oct 2013 06:12:30
-0700
From: [hidden
email]
Subject: OT what kind of regressions, etc w log-normal
variables
To: [hidden
email]

I am asking this because some of us have
a disagreement. I am trying to ask without these
being leading questions.

If there a set of of raw variables some are
log-normally distributed some roughly normal.

Both the roughly normal and the log-normal variables
could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself?
(i.e, what would ordinarily be including x +x**2).

...

If you reply to this email,
your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html

To start a new topic under SPSSX Discussion, email [hidden
email]
To unsubscribe from SPSSX Discussion, click here .
NAML

Art Kendall

Social Research Consultants

View this message in context: Re: OT what
kind of regressions, etc w log-normal variables
Sent from the SPSSX
Discussion mailing list archive at Nabble.com.

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722837.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).