I am asking this because some of us have a
disagreement. I am trying to ask without these being leading
questions.
If there a set of of raw variables some are log-normally distributed some roughly normal. Both the roughly normal and the log-normal variables could be IVs or DVs. How would you model without any interaction terms? How would you model interactions? How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2). -- Art Kendall Social Research Consultants
Art Kendall
Social Research Consultants |
Maybe you need to "lead" a little more?
I don't start out worrying about what is normal or log-normal. However, I do keep in mind the crude rules of thumb offered by John Tukey in his text book ("Exploratory Data Analysis", I think) concerning the range of a variable. "If the largest value of a natural set of scores is 10 times the size of the smallest, you should consider a transformation; if it is 20 time, you should probably take the log." That's from memory, and he probably said it better. So, log-normal is important when it actually affects the scaling. Taking the log won't do much when the range is relatively small, even though the shape may be "log-normal." And his rule-of-thumb is pretty relevant for most measurements in the social sciences whenever there is non-zero, non-negative data with a natural zero which is not going to be observed. Reciprocals, square-roots, etc., are other possible transformations that are natural for the circumstances that generate various sorts of data. "How the numbers are generated" is at least as important in justifying a particular transformation as the resulting shape of the curve before or after transforming. Still, "normality" is less important than (a) observing a linear relationship in a model and (b) observing equal variance in the errors across the range of the model. In the social sciences, it is very common that a univariate distribution that is observed to be log-normal is also going to be modeled most ideally by taking its logs -- especially when the scores cover a large range. That's a convenient coincidence, but certainly is not magic or reliable. I've very seldom included a term for X^2 in a model, and I don't remember ever thinking of it as "the interaction of a variable with itself." About interactions in general -- I like the insight that someone else posted in a stats-group a dozen years ago ... that an interaction is a sign that you have the wrong model, either in scaling or in the choice or definition of variables. -- Rich Ulrich Date: Wed, 30 Oct 2013 06:12:30 -0700 From: [hidden email] Subject: OT what kind of regressions, etc w log-normal variables To: [hidden email] I am asking this because some of us have a disagreement. I am trying to ask without these being leading questions. If there a set of of raw variables some are log-normally distributed some roughly normal. Both the roughly normal and the log-normal variables could be IVs or DVs. How would you model without any interaction terms? How would you model interactions? How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2). ... |
Thank you. That rule
of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures is expected to be log normal. Most people will be normal in the sense of usual. If I recall correctly it was the classic Cohen and Cohen book that called checking for a curvilinear relation using X and X**2 an interaction of a variable with itself. It is in the part where they talk about multiplying IVs by each other. The relation x and y depends on what the value of x. In Psych the classic example is anxiety and performance. low anxiety low performance ("Who cares"); medium anxiety high performance; high anxiety low performance Especially in experiment an interaction is what one would desire. Consider the simplest case. Many years ago it became more broadly known that anova was a specially case of regression which could be considered a special case of canonical correlation. A couple decades ago the term general linear model became widely known. Think about a the simple 2 by 2R (1 between subjects factor and 1 within subjects factor). The between factor is assigned to treatment or not. The within factor is time: pre and post. One hopes to find a difference in change aka a difference of difference. When analyzed by regression time one hopes that the second hierarchical step (this is not the nefarious stepwise approach) will be significant and meaningful /enter time group /enter time*/group Art Kendall Social Research ConsultantsOn 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion] wrote:
Art Kendall
Social Research Consultants |
"Interaction" is also the term used for every correlation observed
in log-linear models of a multi-way table, where the "main effects" describe that univariate frequencies are or are not equal. For your example of an interaction, I will offer this comment. Testing a change across time sometimes is done better by looking at the main effect, between groups, for "change scores". - That puts you immediately in the position of asking the question, which is often relevant, of whether a regressed-change score would be more appropriate than the simple-change: perform the one-way ANCOVA instead? And once you raise the topic of change scores, there is a whole literature on the difficult cases. The "interaction" example that I like is for dominant gene types. The observed phenotype shows up as an interaction. That is -- AA, Aa, and aA all show the dominant trait, whereas only aa does not. However, the more meaningful model is one with a different outcome: the protein or enzyme production encoded by the genes. That model shows the difference as a main effect, where AA has twice as much production as Aa or aA. "Dominant" or not is thus a function of what you get with half-production. For a so-called dominant trait, half as much protein shows up with the same effect as the fuller amount. I think I was always somewhat puzzled by Aa, etc., until I learned the main-effect model. -- Rich Ulrich Date: Thu, 31 Oct 2013 07:52:02 -0700 From: [hidden email] Subject: Re: OT what kind of regressions, etc w log-normal variables To: [hidden email] Thank you. That rule
of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures is expected to be log normal. Most people will be normal in the sense of usual. If I recall correctly it was the classic Cohen and Cohen book that called checking for a curvilinear relation using X and X**2 an interaction of a variable with itself. It is in the part where they talk about multiplying IVs by each other. The relation x and y depends on what the value of x. In Psych the classic example is anxiety and performance. low anxiety low performance ("Who cares"); medium anxiety high performance; high anxiety low performance Especially in experiment an interaction is what one would desire. Consider the simplest case. Many years ago it became more broadly known that anova was a specially case of regression which could be considered a special case of canonical correlation. A couple decades ago the term general linear model became widely known. Think about a the simple 2 by 2R (1 between subjects factor and 1 within subjects factor). The between factor is assigned to treatment or not. The within factor is time: pre and post. One hopes to find a difference in change aka a difference of difference. When analyzed by regression time one hopes that the second hierarchical step (this is not the nefarious stepwise approach) will be significant and meaningful /enter time group /enter time*/group Art Kendall Social Research ConsultantsOn 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion] wrote:
Art Kendall
Social Research Consultants View this message in context: Re: OT what kind of regressions, etc w log-normal variables Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
One way of phrasing
an"interaction" is that {changes or differences or
relations or distributions} of a dependent (explained, criterion,
outcome, left hand) variable are different for different levels of
an independent (explaining, predictor, right hand) variable.
I think it is unfortunate that students are not routinely taught that there are different dialects of statistics and that there are different ways to express the same result. E.g. a correlations dialect might say "adult height is correlated with gender" whereas a design of experiments dialect might say "there is a significant difference between the mean heights of men and women". I also think it is unfortunate that when newer methods such as log-linear became common proponents often were not aware of the earlier usage. Two of the most extreme examples of being unaware of alternate uses of words is the invention of whole new field of independent machine learning in the 70's to to what is conceptually and often mathematically identical to what was called clustering since the 40's. The other is the use of "ontology" for the types or clusters found in "independent machine learning". It is certainly a consideration of whether to use a simple change score or regressed (partial, residual) change score. It is also a consideration whether to to express the regression approach to ancova with or without including interactions of the "covariate" with other variables on the right hand. I have even seen vocabulary that treats anything that one wants to control for that is not manipulated as a "covariate". For example, if the variable one wants to control for is gender, why not make it a factor when using anova vocabulary and include the interactions with race or treatment? Art Kendall Social Research ConsultantsOn 11/1/2013 1:31 AM, Rich Ulrich [via SPSSX Discussion] wrote:
Art Kendall
Social Research Consultants |
Administrator
|
Further to Art's point, interaction is often referred to as "effect modification" in epidemiology and as "moderation" in psychology--e.g., psych researchers may talk about running "moderated multiple regression", which simply means that there is (at least one) product term in the regression model.
Personally, I don't understand why folks in psychology saw the need to introduce a new term for interaction, given how strongly statistics education in psychology focuses on ANOVA models, including factorial models (with interaction terms included). In that context, psychologists have always used and continue to use the term "interaction". But it seems "interaction" was not good enough for psychologists running regression models: They needed their own term for it. I guess Lee Cronbach's Two Disciplines address is still relevant today. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |