Login  Register

Re: OT what kind of regressions, etc w log-normal variables

Posted by Bruce Weaver on Nov 01, 2013; 12:59pm
URL: http://spssx-discussion.165.s1.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722845.html

Further to Art's point, interaction is often referred to as "effect modification" in epidemiology and as "moderation" in psychology--e.g., psych researchers may talk about running "moderated multiple regression", which simply means that there is (at least one) product term in the regression model.  

Personally, I don't understand why folks in psychology saw the need to introduce a new term for interaction, given how strongly statistics education in psychology focuses on ANOVA models, including factorial models (with interaction terms included).  In that context, psychologists have always used and continue to use the term "interaction".  But it seems "interaction" was not good enough for psychologists running regression models:  They needed their own term for it.  I guess Lee Cronbach's Two Disciplines address is still relevant today.  ;-)  


Art Kendall wrote
One way of phrasing
        an"interaction" is that {changes or differences or
      relations or distributions} of a dependent  (explained, criterion,
      outcome, left hand) variable are different for different levels of
      an independent (explaining, predictor, right hand) variable.
     
      I think it is unfortunate that students are not routinely taught
      that there are different dialects of statistics and that there are
      different ways to express the same result.  E.g. a correlations
      dialect might say "adult height is correlated with gender" 
      whereas a design of experiments dialect might say "there is a
      significant difference between the mean heights of men and women".
     
      I also think it is unfortunate that when newer methods such as
      log-linear became common proponents often were not aware of the
      earlier usage.  Two of the most extreme examples of being unaware
      of alternate uses of words is the invention of whole new field of
      independent machine learning in the 70's to to what is
      conceptually and often mathematically identical to what was called
      clustering since the 40's.  The other is the use of "ontology" for
      the types or clusters found in "independent machine learning".
     
      It is certainly a consideration of whether to use a simple change
      score or regressed (partial, residual) change score.  It is also a
      consideration whether to to express the regression approach to
      ancova with or without including interactions of the "covariate"
      with other variables on the right hand.  I have even seen
      vocabulary that treats anything that one wants to control for that
      is not manipulated as a "covariate". For example, if the variable
      one wants to control for is gender, why not make it a factor when
      using anova vocabulary and include the interactions with race or
      treatment?
      Art Kendall
Social Research Consultants
      On 11/1/2013 1:31 AM, Rich Ulrich [via SPSSX Discussion] wrote:
   
   
      "Interaction" is also the term used for every
        correlation observed
        in log-linear models of a multi-way table, where the "main
        effects"
        describe that univariate frequencies are or are not equal.
       
        For your example of an interaction, I will offer this comment. 
        Testing a change
        across time sometimes is done better by looking at the main
        effect, between groups,
        for "change scores".   - That puts you immediately in the
        position of
        asking the question, which is often relevant, of whether a
        regressed-change
        score would be more appropriate than the simple-change: perform
        the one-way
        ANCOVA instead?  And once you raise the topic of change scores,
        there is a whole
        literature on the difficult cases.
       
        The "interaction" example that I like is for dominant gene
        types.  The observed
        phenotype shows up as an interaction.  That is -- AA, Aa, and
        aA  all show the
        dominant trait, whereas only  aa does not.  However, the more
        meaningful
        model is one with a different outcome:  the protein or enzyme
        production
        encoded by the genes.  That model shows the difference as a main
        effect, where AA
        has twice as much production as Aa or aA.  "Dominant" or  not is
        thus a function
        of what you get with half-production.  For a so-called dominant
        trait, half as much
        protein shows up with the same effect as the fuller amount.  I
        think I was always
        somewhat puzzled by Aa, etc., until I learned the main-effect
        model.
       
        --
        Rich Ulrich
       
       
       
       
          Date: Thu, 31 Oct 2013 07:52:02 -0700
          From: [hidden email]
          Subject: Re: OT what kind of regressions, etc w log-normal
          variables
          To: [hidden email]
         
          Thank you. 
              That rule of thumb from Tukey will be very helpful.
              In the study under discussion, the population of measures
              is expected to be log normal. Most people will be normal
              in the sense of usual.
             
              If I recall correctly it was the classic Cohen and Cohen
              book that called checking for a curvilinear relation using
              X and X**2 an interaction of a variable with itself. It is
              in the part where they talk about multiplying IVs by each
              other. The relation x and y depends on what the value of
              x.  In Psych the classic example is anxiety and
              performance. low anxiety low performance ("Who cares");
              medium anxiety high performance; high anxiety low
              performance
             
              Especially in experiment an interaction is what one would
              desire.  Consider the simplest  case.  Many years ago it
              became more broadly known that anova was a specially case
              of regression which could be considered a special case of
              canonical correlation. A couple decades ago the term
              general linear model became widely known.
             
              Think about a the simple 2 by 2R (1 between subjects
              factor and 1 within subjects factor).  The between factor
              is assigned to treatment or not.  The within factor is
              time: pre and post. One hopes to find a difference in
              change aka a difference of difference.  When analyzed by
              regression time one hopes that the second hierarchical
              step (this is not the nefarious stepwise approach) will be
              significant and meaningful
              /enter time group
              /enter time*/group
           
            Art Kendall
Social Research Consultants
            On 10/30/2013 11:28 PM, Rich Ulrich [via SPSSX Discussion]
            wrote:
         
         
            Maybe you need to "lead" a little more?
             
              I don't start out worrying about what is normal or
              log-normal.
              However, I do keep in mind the crude rules of thumb
              offered
              by John Tukey in his text book  ("Exploratory Data
              Analysis", I
              think)  concerning the range of a variable.  "If the
              largest value
              of a natural set of scores is 10 times the size of the
              smallest,
              you should consider a transformation; if it is 20 time,
              you should
              probably take the log."  That's from memory, and he
              probably said
              it better.
             
              So, log-normal is important when it actually affects the
              scaling.
              Taking the log won't do much when the range is relatively
              small,
              even though the shape may be "log-normal."
             
              And his rule-of-thumb is pretty relevant for most
              measurements in
              the social sciences whenever there is non-zero,
              non-negative data
              with a natural zero which is not going to be observed. 
              Reciprocals,
              square-roots, etc., are other possible transformations
              that are natural
              for the circumstances that generate various sorts of
              data.  "How the
              numbers are generated" is at least as important in
              justifying a particular
              transformation as the resulting shape of the curve before
              or after
              transforming.
             
              Still, "normality" is less important than (a) observing a
              linear relationship
              in a model and (b) observing equal variance in the errors
              across the
              range of the model.  In the social sciences, it is very
              common that
              a univariate distribution that is observed to be
              log-normal is also
              going to be modeled most ideally by taking its logs --
              especially when
              the scores cover a large range.  That's a convenient
              coincidence, but
              certainly is not magic or reliable.
             
              I've very seldom included a term for X^2 in a model, and I
              don't remember
              ever thinking of it as "the interaction of a variable with
              itself." 
             
              About interactions in general -- I like the insight that
              someone else posted
              in a stats-group a dozen years ago ... that an interaction
              is a sign that
              you have the wrong model, either in scaling or in the
              choice or definition
              of variables.
             
              --
              Rich Ulrich
             
             
                Date: Wed, 30 Oct 2013 06:12:30
                -0700
                From: [hidden
                  email]
                Subject: OT what kind of regressions, etc w log-normal
                variables
                To: [hidden
                  email]
               
                I am asking this because some of us have
                  a disagreement. I am trying to ask  without these
                  being leading questions.
                 
                  If there a set of of raw variables  some are
                  log-normally distributed  some roughly normal.
                 
                  Both the roughly normal and the log-normal variables
                  could be IVs or DVs.
                  How would you model without any interaction terms?
                  How would you model interactions?
                  How would you model the interaction of x with itself?
                  (i.e, what would ordinarily be including x +x**2).
               
                ...

             
           
           
           
           
           
              If you reply to this email,
                your message will be added to the discussion below:
              http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722818.html 
           
           
              To start a new topic under SPSSX Discussion, email [hidden
                email]  
              To unsubscribe from SPSSX Discussion, click here .
              NAML  
         
         
           Art Kendall
           
            Social Research Consultants
         
         
          View this message in context: Re: OT what
            kind of regressions, etc w log-normal variables
          Sent from the SPSSX
            Discussion mailing list archive  at Nabble.com.
       
     
     
     
     
     
        If you reply to this email, your
          message will be added to the discussion below:
        http://spssx-discussion.1045642.n5.nabble.com/OT-what-kind-of-regressions-etc-w-log-normal-variables-tp5722805p5722837.html 
     
     
        To start a new topic under SPSSX Discussion, email
        [hidden email] 
        To unsubscribe from SPSSX Discussion, click
          here .
        NAML
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).