OT what kind of regressions, etc w log-normal variables

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

OT what kind of regressions, etc w log-normal variables

Art Kendall
I am asking this because some of us have a disagreement. I am trying to ask  without these being leading questions.

If there a set of of raw variables  some are log-normally distributed  some roughly normal.

Both the roughly normal and the log-normal variables could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2).
-- 
Art Kendall
Social Research Consultants
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: OT what kind of regressions, etc w log-normal variables

Rich Ulrich
Maybe you need to "lead" a little more?

I don't start out worrying about what is normal or log-normal.
However, I do keep in mind the crude rules of thumb offered
by John Tukey in his text book  ("Exploratory Data Analysis", I
think)  concerning the range of a variable.  "If the largest value
of a natural set of scores is 10 times the size of the smallest,
you should consider a transformation; if it is 20 time, you should
probably take the log."  That's from memory, and he probably said
it better.

So, log-normal is important when it actually affects the scaling.
Taking the log won't do much when the range is relatively small,
even though the shape may be "log-normal."

And his rule-of-thumb is pretty relevant for most measurements in
the social sciences whenever there is non-zero, non-negative data
with a natural zero which is not going to be observed.  Reciprocals,
square-roots, etc., are other possible transformations that are natural
for the circumstances that generate various sorts of data.  "How the
numbers are generated" is at least as important in justifying a particular
transformation as the resulting shape of the curve before or after
transforming.

Still, "normality" is less important than (a) observing a linear relationship
in a model and (b) observing equal variance in the errors across the
range of the model.  In the social sciences, it is very common that
a univariate distribution that is observed to be log-normal is also
going to be modeled most ideally by taking its logs -- especially when
the scores cover a large range.  That's a convenient coincidence, but
certainly is not magic or reliable.

I've very seldom included a term for X^2 in a model, and I don't remember
ever thinking of it as "the interaction of a variable with itself." 

About interactions in general -- I like the insight that someone else posted
in a stats-group a dozen years ago ... that an interaction is a sign that
you have the wrong model, either in scaling or in the choice or definition
of variables.

--
Rich Ulrich


Date: Wed, 30 Oct 2013 06:12:30 -0700
From: [hidden email]
Subject: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

I am asking this because some of us have a disagreement. I am trying to ask  without these being leading questions.

If there a set of of raw variables  some are log-normally distributed  some roughly normal.

Both the roughly normal and the log-normal variables could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2).
...
Reply | Threaded
Open this post in threaded view
|

Re: OT what kind of regressions, etc w log-normal variables

Art Kendall
Thank you.  That rule of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures is expected to be log normal. Most people will be normal in the sense of usual.

If I recall correctly it was the classic Cohen and Cohen book that called checking for a curvilinear relation using X and X**2 an interaction of a variable with itself. It is in the part where they talk about multiplying IVs by each other. The relation x and y depends on what the value of x.  In Psych the classic example is anxiety and performance. low anxiety low performance ("Who cares"); medium anxiety high performance; high anxiety low performance

Especially in experiment an interaction is what one would desire.  Consider the simplest  case.  Many years ago it became more broadly known that anova was a specially case of regression which could be considered a special case of canonical correlation. A couple decades ago the term general linear model became widely known.

Think about a the simple 2 by 2R (1 between subjects factor and 1 within subjects factor).  The between factor is assigned to treatment or not.  The within factor is time: pre and post. One hopes to find a difference in change aka a difference of difference.  When analyzed by regression time one hopes that the second hierarchical step (this is not the nefarious stepwise approach) will be significant and meaningful
/enter time group
/enter time*/group
Art Kendall
Social Research Consultants
On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion] wrote:
Maybe you need to "lead" a little more?

I don't start out worrying about what is normal or log-normal.
However, I do keep in mind the crude rules of thumb offered
by John Tukey in his text book  ("Exploratory Data Analysis", I
think)  concerning the range of a variable.  "If the largest value
of a natural set of scores is 10 times the size of the smallest,
you should consider a transformation; if it is 20 time, you should
probably take the log."  That's from memory, and he probably said
it better.

So, log-normal is important when it actually affects the scaling.
Taking the log won't do much when the range is relatively small,
even though the shape may be "log-normal."

And his rule-of-thumb is pretty relevant for most measurements in
the social sciences whenever there is non-zero, non-negative data
with a natural zero which is not going to be observed.  Reciprocals,
square-roots, etc., are other possible transformations that are natural
for the circumstances that generate various sorts of data.  "How the
numbers are generated" is at least as important in justifying a particular
transformation as the resulting shape of the curve before or after
transforming.

Still, "normality" is less important than (a) observing a linear relationship
in a model and (b) observing equal variance in the errors across the
range of the model.  In the social sciences, it is very common that
a univariate distribution that is observed to be log-normal is also
going to be modeled most ideally by taking its logs -- especially when
the scores cover a large range.  That's a convenient coincidence, but
certainly is not magic or reliable.

I've very seldom included a term for X^2 in a model, and I don't remember
ever thinking of it as "the interaction of a variable with itself." 

About interactions in general -- I like the insight that someone else posted
in a stats-group a dozen years ago ... that an interaction is a sign that
you have the wrong model, either in scaling or in the choice or definition
of variables.

--
Rich Ulrich


Date: Wed, 30 Oct 2013 06:12:30 -0700
From: [hidden email]
Subject: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

I am asking this because some of us have a disagreement. I am trying to ask  without these being leading questions.

If there a set of of raw variables  some are log-normally distributed  some roughly normal.

Both the roughly normal and the log-normal variables could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2).
...



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: OT what kind of regressions, etc w log-normal variables

Rich Ulrich
"Interaction" is also the term used for every correlation observed
in log-linear models of a multi-way table, where the "main effects"
describe that univariate frequencies are or are not equal.

For your example of an interaction, I will offer this comment.  Testing a change
across time sometimes is done better by looking at the main effect, between groups,
for "change scores".   - That puts you immediately in the position of
asking the question, which is often relevant, of whether a regressed-change
score would be more appropriate than the simple-change: perform the one-way
ANCOVA instead?  And once you raise the topic of change scores, there is a whole
literature on the difficult cases.

The "interaction" example that I like is for dominant gene types.  The observed
phenotype shows up as an interaction.  That is -- AA, Aa, and aA  all show the
dominant trait, whereas only  aa does not.  However, the more meaningful
model is one with a different outcome:  the protein or enzyme production
encoded by the genes.  That model shows the difference as a main effect, where AA
has twice as much production as Aa or aA.  "Dominant" or  not is thus a function
of what you get with half-production.  For a so-called dominant trait, half as much
protein shows up with the same effect as the fuller amount.  I think I was always
somewhat puzzled by Aa, etc., until I learned the main-effect model.

--
Rich Ulrich




Date: Thu, 31 Oct 2013 07:52:02 -0700
From: [hidden email]
Subject: Re: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

Thank you.  That rule of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures is expected to be log normal. Most people will be normal in the sense of usual.

If I recall correctly it was the classic Cohen and Cohen book that called checking for a curvilinear relation using X and X**2 an interaction of a variable with itself. It is in the part where they talk about multiplying IVs by each other. The relation x and y depends on what the value of x.  In Psych the classic example is anxiety and performance. low anxiety low performance ("Who cares"); medium anxiety high performance; high anxiety low performance

Especially in experiment an interaction is what one would desire.  Consider the simplest  case.  Many years ago it became more broadly known that anova was a specially case of regression which could be considered a special case of canonical correlation. A couple decades ago the term general linear model became widely known.

Think about a the simple 2 by 2R (1 between subjects factor and 1 within subjects factor).  The between factor is assigned to treatment or not.  The within factor is time: pre and post. One hopes to find a difference in change aka a difference of difference.  When analyzed by regression time one hopes that the second hierarchical step (this is not the nefarious stepwise approach) will be significant and meaningful
/enter time group
/enter time*/group
Art Kendall
Social Research Consultants
On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion] wrote:
Maybe you need to "lead" a little more?

I don't start out worrying about what is normal or log-normal.
However, I do keep in mind the crude rules of thumb offered
by John Tukey in his text book  ("Exploratory Data Analysis", I
think)  concerning the range of a variable.  "If the largest value
of a natural set of scores is 10 times the size of the smallest,
you should consider a transformation; if it is 20 time, you should
probably take the log."  That's from memory, and he probably said
it better.

So, log-normal is important when it actually affects the scaling.
Taking the log won't do much when the range is relatively small,
even though the shape may be "log-normal."

And his rule-of-thumb is pretty relevant for most measurements in
the social sciences whenever there is non-zero, non-negative data
with a natural zero which is not going to be observed.  Reciprocals,
square-roots, etc., are other possible transformations that are natural
for the circumstances that generate various sorts of data.  "How the
numbers are generated" is at least as important in justifying a particular
transformation as the resulting shape of the curve before or after
transforming.

Still, "normality" is less important than (a) observing a linear relationship
in a model and (b) observing equal variance in the errors across the
range of the model.  In the social sciences, it is very common that
a univariate distribution that is observed to be log-normal is also
going to be modeled most ideally by taking its logs -- especially when
the scores cover a large range.  That's a convenient coincidence, but
certainly is not magic or reliable.

I've very seldom included a term for X^2 in a model, and I don't remember
ever thinking of it as "the interaction of a variable with itself." 

About interactions in general -- I like the insight that someone else posted
in a stats-group a dozen years ago ... that an interaction is a sign that
you have the wrong model, either in scaling or in the choice or definition
of variables.

--
Rich Ulrich


Date: Wed, 30 Oct 2013 06:12:30 -0700
From: [hidden email]
Subject: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

I am asking this because some of us have a disagreement. I am trying to ask  without these being leading questions.

If there a set of of raw variables  some are log-normally distributed  some roughly normal.

Both the roughly normal and the log-normal variables could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2).
...



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, <a href="http://" rel="nofollow" target="_blank">click here.
NAML

Art Kendall
Social Research Consultants


View this message in context: Re: OT what kind of regressions, etc w log-normal variables
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: OT what kind of regressions, etc w log-normal variables

Art Kendall
One way of phrasing an"interaction" is that {changes or differences or relations or distributions} of a dependent  (explained, criterion, outcome, left hand) variable are different for different levels of an independent (explaining, predictor, right hand) variable.

I think it is unfortunate that students are not routinely taught that there are different dialects of statistics and that there are different ways to express the same result.  E.g. a correlations dialect might say "adult height is correlated with gender"  whereas a design of experiments dialect might say "there is a significant difference between the mean heights of men and women".

I also think it is unfortunate that when newer methods such as log-linear became common proponents often were not aware of the earlier usage.  Two of the most extreme examples of being unaware of alternate uses of words is the invention of whole new field of independent machine learning in the 70's to to what is conceptually and often mathematically identical to what was called clustering since the 40's.  The other is the use of "ontology" for the types or clusters found in "independent machine learning".

It is certainly a consideration of whether to use a simple change score or regressed (partial, residual) change score.  It is also a consideration whether to to express the regression approach to ancova with or without including interactions of the "covariate" with other variables on the right hand.  I have even seen vocabulary that treats anything that one wants to control for that is not manipulated as a "covariate". For example, if the variable one wants to control for is gender, why not make it a factor when using anova vocabulary and include the interactions with race or treatment?
Art Kendall
Social Research Consultants
On 11/1/2013 1:31 AM, Rich Ulrich [via SPSSX Discussion] wrote:
"Interaction" is also the term used for every correlation observed
in log-linear models of a multi-way table, where the "main effects"
describe that univariate frequencies are or are not equal.

For your example of an interaction, I will offer this comment.  Testing a change
across time sometimes is done better by looking at the main effect, between groups,
for "change scores".   - That puts you immediately in the position of
asking the question, which is often relevant, of whether a regressed-change
score would be more appropriate than the simple-change: perform the one-way
ANCOVA instead?  And once you raise the topic of change scores, there is a whole
literature on the difficult cases.

The "interaction" example that I like is for dominant gene types.  The observed
phenotype shows up as an interaction.  That is -- AA, Aa, and aA  all show the
dominant trait, whereas only  aa does not.  However, the more meaningful
model is one with a different outcome:  the protein or enzyme production
encoded by the genes.  That model shows the difference as a main effect, where AA
has twice as much production as Aa or aA.  "Dominant" or  not is thus a function
of what you get with half-production.  For a so-called dominant trait, half as much
protein shows up with the same effect as the fuller amount.  I think I was always
somewhat puzzled by Aa, etc., until I learned the main-effect model.

--
Rich Ulrich




Date: Thu, 31 Oct 2013 07:52:02 -0700
From: [hidden email]
Subject: Re: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

Thank you.  That rule of thumb from Tukey will be very helpful.
In the study under discussion, the population of measures is expected to be log normal. Most people will be normal in the sense of usual.

If I recall correctly it was the classic Cohen and Cohen book that called checking for a curvilinear relation using X and X**2 an interaction of a variable with itself. It is in the part where they talk about multiplying IVs by each other. The relation x and y depends on what the value of x.  In Psych the classic example is anxiety and performance. low anxiety low performance ("Who cares"); medium anxiety high performance; high anxiety low performance

Especially in experiment an interaction is what one would desire.  Consider the simplest  case.  Many years ago it became more broadly known that anova was a specially case of regression which could be considered a special case of canonical correlation. A couple decades ago the term general linear model became widely known.

Think about a the simple 2 by 2R (1 between subjects factor and 1 within subjects factor).  The between factor is assigned to treatment or not.  The within factor is time: pre and post. One hopes to find a difference in change aka a difference of difference.  When analyzed by regression time one hopes that the second hierarchical step (this is not the nefarious stepwise approach) will be significant and meaningful
/enter time group
/enter time*/group
Art Kendall
Social Research Consultants
On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion] wrote:
Maybe you need to "lead" a little more?

I don't start out worrying about what is normal or log-normal.
However, I do keep in mind the crude rules of thumb offered
by John Tukey in his text book  ("Exploratory Data Analysis", I
think)  concerning the range of a variable.  "If the largest value
of a natural set of scores is 10 times the size of the smallest,
you should consider a transformation; if it is 20 time, you should
probably take the log."  That's from memory, and he probably said
it better.

So, log-normal is important when it actually affects the scaling.
Taking the log won't do much when the range is relatively small,
even though the shape may be "log-normal."

And his rule-of-thumb is pretty relevant for most measurements in
the social sciences whenever there is non-zero, non-negative data
with a natural zero which is not going to be observed.  Reciprocals,
square-roots, etc., are other possible transformations that are natural
for the circumstances that generate various sorts of data.  "How the
numbers are generated" is at least as important in justifying a particular
transformation as the resulting shape of the curve before or after
transforming.

Still, "normality" is less important than (a) observing a linear relationship
in a model and (b) observing equal variance in the errors across the
range of the model.  In the social sciences, it is very common that
a univariate distribution that is observed to be log-normal is also
going to be modeled most ideally by taking its logs -- especially when
the scores cover a large range.  That's a convenient coincidence, but
certainly is not magic or reliable.

I've very seldom included a term for X^2 in a model, and I don't remember
ever thinking of it as "the interaction of a variable with itself." 

About interactions in general -- I like the insight that someone else posted
in a stats-group a dozen years ago ... that an interaction is a sign that
you have the wrong model, either in scaling or in the choice or definition
of variables.

--
Rich Ulrich


Date: Wed, 30 Oct 2013 06:12:30 -0700
From: [hidden email]
Subject: OT what kind of regressions, etc w log-normal variables
To: [hidden email]

I am asking this because some of us have a disagreement. I am trying to ask  without these being leading questions.

If there a set of of raw variables  some are log-normally distributed  some roughly normal.

Both the roughly normal and the log-normal variables could be IVs or DVs.
How would you model without any interaction terms?
How would you model interactions?
How would you model the interaction of x with itself? (i.e, what would ordinarily be including x +x**2).
...



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, <a moz-do-not-send="true" href="http://" rel="nofollow" target="_blank" link="external">click here.
NAML

Art Kendall
Social Research Consultants


View this message in context: Re: OT what kind of regressions, etc w log-normal variables
Sent from the SPSSX Discussion mailing list archive at Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722837.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: OT what kind of regressions, etc w log-normal variables

Bruce Weaver
Administrator
Further to Art's point, interaction is often referred to as "effect modification" in epidemiology and as "moderation" in psychology--e.g., psych researchers may talk about running "moderated multiple regression", which simply means that there is (at least one) product term in the regression model.  

Personally, I don't understand why folks in psychology saw the need to introduce a new term for interaction, given how strongly statistics education in psychology focuses on ANOVA models, including factorial models (with interaction terms included).  In that context, psychologists have always used and continue to use the term "interaction".  But it seems "interaction" was not good enough for psychologists running regression models:  They needed their own term for it.  I guess Lee Cronbach's Two Disciplines address is still relevant today.  ;-)  


Art Kendall wrote
One way of phrasing
        an"interaction" is that {changes or differences or
      relations or distributions} of a dependent  (explained, criterion,
      outcome, left hand) variable are different for different levels of
      an independent (explaining, predictor, right hand) variable.
     
      I think it is unfortunate that students are not routinely taught
      that there are different dialects of statistics and that there are
      different ways to express the same result.  E.g. a correlations
      dialect might say "adult height is correlated with gender" 
      whereas a design of experiments dialect might say "there is a
      significant difference between the mean heights of men and women".
     
      I also think it is unfortunate that when newer methods such as
      log-linear became common proponents often were not aware of the
      earlier usage.  Two of the most extreme examples of being unaware
      of alternate uses of words is the invention of whole new field of
      independent machine learning in the 70's to to what is
      conceptually and often mathematically identical to what was called
      clustering since the 40's.  The other is the use of "ontology" for
      the types or clusters found in "independent machine learning".
     
      It is certainly a consideration of whether to use a simple change
      score or regressed (partial, residual) change score.  It is also a
      consideration whether to to express the regression approach to
      ancova with or without including interactions of the "covariate"
      with other variables on the right hand.  I have even seen
      vocabulary that treats anything that one wants to control for that
      is not manipulated as a "covariate". For example, if the variable
      one wants to control for is gender, why not make it a factor when
      using anova vocabulary and include the interactions with race or
      treatment?
      Art Kendall
Social Research Consultants
      On 11/1/2013 1:31 AM, Rich Ulrich [via SPSSX Discussion] wrote:
   
   
      "Interaction" is also the term used for every
        correlation observed
        in log-linear models of a multi-way table, where the "main
        effects"
        describe that univariate frequencies are or are not equal.
       
        For your example of an interaction, I will offer this comment. 
        Testing a change
        across time sometimes is done better by looking at the main
        effect, between groups,
        for "change scores".   - That puts you immediately in the
        position of
        asking the question, which is often relevant, of whether a
        regressed-change
        score would be more appropriate than the simple-change: perform
        the one-way
        ANCOVA instead?  And once you raise the topic of change scores,
        there is a whole
        literature on the difficult cases.
       
        The "interaction" example that I like is for dominant gene
        types.  The observed
        phenotype shows up as an interaction.  That is -- AA, Aa, and
        aA  all show the
        dominant trait, whereas only  aa does not.  However, the more
        meaningful
        model is one with a different outcome:  the protein or enzyme
        production
        encoded by the genes.  That model shows the difference as a main
        effect, where AA
        has twice as much production as Aa or aA.  "Dominant" or  not is
        thus a function
        of what you get with half-production.  For a so-called dominant
        trait, half as much
        protein shows up with the same effect as the fuller amount.  I
        think I was always
        somewhat puzzled by Aa, etc., until I learned the main-effect
        model.
       
        --
        Rich Ulrich
       
       
       
       
          Date: Thu, 31 Oct 2013 07:52:02 -0700
          From: [hidden email]
          Subject: Re: OT what kind of regressions, etc w log-normal
          variables
          To: [hidden email]
         
          Thank you. 
              That rule of thumb from Tukey will be very helpful.
              In the study under discussion, the population of measures
              is expected to be log normal. Most people will be normal
              in the sense of usual.
             
              If I recall correctly it was the classic Cohen and Cohen
              book that called checking for a curvilinear relation using
              X and X**2 an interaction of a variable with itself. It is
              in the part where they talk about multiplying IVs by each
              other. The relation x and y depends on what the value of
              x.  In Psych the classic example is anxiety and
              performance. low anxiety low performance ("Who cares");
              medium anxiety high performance; high anxiety low
              performance
             
              Especially in experiment an interaction is what one would
              desire.  Consider the simplest  case.  Many years ago it
              became more broadly known that anova was a specially case
              of regression which could be considered a special case of
              canonical correlation. A couple decades ago the term
              general linear model became widely known.
             
              Think about a the simple 2 by 2R (1 between subjects
              factor and 1 within subjects factor).  The between factor
              is assigned to treatment or not.  The within factor is
              time: pre and post. One hopes to find a difference in
              change aka a difference of difference.  When analyzed by
              regression time one hopes that the second hierarchical
              step (this is not the nefarious stepwise approach) will be
              significant and meaningful
              /enter time group
              /enter time*/group
           
            Art Kendall
Social Research Consultants
            On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion]
            wrote:
         
         
            Maybe you need to "lead" a little more?
             
              I don't start out worrying about what is normal or
              log-normal.
              However, I do keep in mind the crude rules of thumb
              offered
              by John Tukey in his text book  ("Exploratory Data
              Analysis", I
              think)  concerning the range of a variable.  "If the
              largest value
              of a natural set of scores is 10 times the size of the
              smallest,
              you should consider a transformation; if it is 20 time,
              you should
              probably take the log."  That's from memory, and he
              probably said
              it better.
             
              So, log-normal is important when it actually affects the
              scaling.
              Taking the log won't do much when the range is relatively
              small,
              even though the shape may be "log-normal."
             
              And his rule-of-thumb is pretty relevant for most
              measurements in
              the social sciences whenever there is non-zero,
              non-negative data
              with a natural zero which is not going to be observed. 
              Reciprocals,
              square-roots, etc., are other possible transformations
              that are natural
              for the circumstances that generate various sorts of
              data.  "How the
              numbers are generated" is at least as important in
              justifying a particular
              transformation as the resulting shape of the curve before
              or after
              transforming.
             
              Still, "normality" is less important than (a) observing a
              linear relationship
              in a model and (b) observing equal variance in the errors
              across the
              range of the model.  In the social sciences, it is very
              common that
              a univariate distribution that is observed to be
              log-normal is also
              going to be modeled most ideally by taking its logs --
              especially when
              the scores cover a large range.  That's a convenient
              coincidence, but
              certainly is not magic or reliable.
             
              I've very seldom included a term for X^2 in a model, and I
              don't remember
              ever thinking of it as "the interaction of a variable with
              itself." 
             
              About interactions in general -- I like the insight that
              someone else posted
              in a stats-group a dozen years ago ... that an interaction
              is a sign that
              you have the wrong model, either in scaling or in the
              choice or definition
              of variables.
             
              --
              Rich Ulrich
             
             
                Date: Wed, 30 Oct 2013 06:12:30
                -0700
                From: [hidden
                  email]
                Subject: OT what kind of regressions, etc w log-normal
                variables
                To: [hidden
                  email]
               
                I am asking this because some of us have
                  a disagreement. I am trying to ask  without these
                  being leading questions.
                 
                  If there a set of of raw variables  some are
                  log-normally distributed  some roughly normal.
                 
                  Both the roughly normal and the log-normal variables
                  could be IVs or DVs.
                  How would you model without any interaction terms?
                  How would you model interactions?
                  How would you model the interaction of x with itself?
                  (i.e, what would ordinarily be including x +x**2).
               
                ...

             
           
           
           
           
           
              If you reply to this email,
                your message will be added to the discussion below:
              http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html 
           
           
              To start a new topic under SPSSX Discussion, email [hidden
                email]  
              To unsubscribe from SPSSX Discussion, click here .
              NAML  
         
         
           Art Kendall
           
            Social Research Consultants
         
         
          View this message in context: Re: OT what
            kind of regressions, etc w log-normal variables
          Sent from the SPSSX
            Discussion mailing list archive  at Nabble.com.
       
     
     
     
     
     
        If you reply to this email, your
          message will be added to the discussion below:
        http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722837.html 
     
     
        To start a new topic under SPSSX Discussion, email
        [hidden email] 
        To unsubscribe from SPSSX Discussion, click
          here .
        NAML
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).