non normal data: transform or non parametric test

classic Classic list List threaded Threaded
10 messages Options
ro
Reply | Threaded
Open this post in threaded view
|

non normal data: transform or non parametric test

ro
Hi all;

I am working with biological data and have non normal data, the I wonder if choice transform data (log, sqr...) for a t-testt or choice non parametric test.
thanks in advance
Ro
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

Rich Ulrich
What you are calling "non parametric test" is probably the
equivalent of an ordinary parametric test performed on rank-
transformed data.

Is a rank-transformation, which is not reversible and is definitely
not "normal" in the resulting distribution, a better idea than
a power transformation for biological data?  - Almost never.

--
Rich Ulrich


> Date: Fri, 18 Jan 2013 10:52:44 -0800

> From: [hidden email]
> Subject: non normal data: transform or non parametric test
> To: [hidden email]
>
> Hi all;
>
> I am working with biological data and have non normal data, the I wonder if
> choice transform data (log, sqr...) for a t-testt or choice non parametric
> test.
> thanks in advance
> Ro
...
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

Bruce Weaver
Administrator
In reply to this post by ro
Please provide more information.  For starters:

1. What is the dependent variable?
2. What are the explanatory variables?
3. What is the null hypothesis?  (We can probably guess this from the answers to 1 and 2, but what the heck!)
4. What is the sample size?
5. What are the conventions in your field for this type of analysis?

Note that for general linear models (including linear regression, t-test, ANOVA), it is the errors that are assumed to be normally distributed, but the assumption of independence between fitted values and the errors is far more important than normality of the errors.  And as George Box famously noted, normal distributions and straight lines don't exist in nature; nevertheless, the normal distribution and straight line can be useful as models/approximations of natural phenomena.


ro wrote
Hi all;

I am working with biological data and have non normal data, the I wonder if choice transform data (log, sqr...) for a t-testt or choice non parametric test.
thanks in advance
Ro
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
ro
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

ro
Hi all,

Thanks for you reply:

Mi dependent variable is weight for 1134 (sample size) childs between 5-7 years
explanatory variables are region (1-3)  and sex.
I would llike compare sex and 3 differents regions.
for non parametric method i mean Mann-Whitney and Kolmogórov-Smirnov.

Thank in advance
Regards
Ro
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

Bruce Weaver
Administrator
I would run an ANOVA or ANCOVA model and look at the residual plots.  If the residual plots look good (meaning they show independence between fitted values & residuals), I'd stick with one of those models.

Here's an example using data from the General Social Survey data file that comes with SPSS.  You'll have to modify variable names to make it fit your situation.

new file.
dataset close all.
* Change path on the next line as necessary.
GET FILE='C:\SPSSdata\1991 U.S. General Social Survey.sav'.

* Two-way ANOVA model.

UNIANOVA prestg80 BY sex region
  /METHOD=SSTYPE(3)
  /INTERCEPT=INCLUDE
  /SAVE=PRED (Pred1) RESID (Resid1)
  /EMMEANS=TABLES(sex)
  /EMMEANS=TABLES(region)
  /EMMEANS=TABLES(sex*region)
  /CRITERIA=ALPHA(0.05)
  /DESIGN=sex region sex*region.

* Look at the residual plot.
GRAPH /SCATTERPLOT(BIVAR)=Pred1 WITH Resid1 .


* Two-way ANCOVA model, with Age as the covariate.

UNIANOVA prestg80 BY sex region WITH age
  /METHOD=SSTYPE(3)
  /INTERCEPT=INCLUDE
  /SAVE=PRED (Pred2) RESID (Resid2)
  /EMMEANS=TABLES(sex) WITH(age=MEAN)
  /EMMEANS=TABLES(region) WITH(age=MEAN)
  /EMMEANS=TABLES(sex*region) WITH(age=MEAN)
  /CRITERIA=ALPHA(0.05)
  /DESIGN=age sex region sex*region.

GRAPH /SCATTERPLOT(BIVAR)=Pred2 WITH Resid2 .


For both of these models, the residuals appear to be both independent of the fitted values and homoscedastic.  Those two assumptions -- the independent and identically distributed portions of i.i.d. N(0,sigma^2) -- are the most important ones, far more important than normality of the errors.  

HTH.


ro wrote
Hi all,

Thanks for you reply:

Mi dependent variable is weight for 1134 (sample size) childs between 5-7 years
explanatory variables are region (1-3)  and sex.
I would llike compare sex and 3 differents regions.
for non parametric method i mean Mann-Whitney and Kolmogórov-Smirnov.

Thank in advance
Regards
Ro
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

Bruce Weaver
Administrator
p.s. - Did you see this thread?

http://spssx-discussion.1045642.n5.nabble.com/OT-t-tests-non-parametric-tests-and-large-studies-a-paradox-of-statistical-practice-td5717324.html


Bruce Weaver wrote
I would run an ANOVA or ANCOVA model and look at the residual plots.  If the residual plots look good (meaning they show independence between fitted values & residuals), I'd stick with one of those models.

Here's an example using data from the General Social Survey data file that comes with SPSS.  You'll have to modify variable names to make it fit your situation.

new file.
dataset close all.
* Change path on the next line as necessary.
GET FILE='C:\SPSSdata\1991 U.S. General Social Survey.sav'.

* Two-way ANOVA model.

UNIANOVA prestg80 BY sex region
  /METHOD=SSTYPE(3)
  /INTERCEPT=INCLUDE
  /SAVE=PRED (Pred1) RESID (Resid1)
  /EMMEANS=TABLES(sex)
  /EMMEANS=TABLES(region)
  /EMMEANS=TABLES(sex*region)
  /CRITERIA=ALPHA(0.05)
  /DESIGN=sex region sex*region.

* Look at the residual plot.
GRAPH /SCATTERPLOT(BIVAR)=Pred1 WITH Resid1 .


* Two-way ANCOVA model, with Age as the covariate.

UNIANOVA prestg80 BY sex region WITH age
  /METHOD=SSTYPE(3)
  /INTERCEPT=INCLUDE
  /SAVE=PRED (Pred2) RESID (Resid2)
  /EMMEANS=TABLES(sex) WITH(age=MEAN)
  /EMMEANS=TABLES(region) WITH(age=MEAN)
  /EMMEANS=TABLES(sex*region) WITH(age=MEAN)
  /CRITERIA=ALPHA(0.05)
  /DESIGN=age sex region sex*region.

GRAPH /SCATTERPLOT(BIVAR)=Pred2 WITH Resid2 .


For both of these models, the residuals appear to be both independent of the fitted values and homoscedastic.  Those two assumptions -- the independent and identically distributed portions of i.i.d. N(0,sigma^2) -- are the most important ones, far more important than normality of the errors.  

HTH.


ro wrote
Hi all,

Thanks for you reply:

Mi dependent variable is weight for 1134 (sample size) childs between 5-7 years
explanatory variables are region (1-3)  and sex.
I would llike compare sex and 3 differents regions.
for non parametric method i mean Mann-Whitney and Kolmogórov-Smirnov.

Thank in advance
Regards
Ro
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

Rich Ulrich
In reply to this post by ro
MW is basically an ANOVA test after a rank-transformation, and
I already detailed enough drawbacks of that.  In addition to what I
mentioned, "ranks" do not do especially well with multiple variables
in an analysis.  Since they tend to introduce artifacts of interaction,
you almost never find a multi-factor analysis of ranks.

K-S  is only for two groups at a time, with no covariates, so that
doesn't give you much for these data.

You certainly should have age-in-months as another covariate, for
a human sample with ages from 5 to 7 years.

BMI would probably be a more sensible contrast for nutrition ...
which, combined with an unspecified set of "regions,"  brings up
the question of whether you ought to look at "overweight" or
"starving" as special categories that screw up a simple, linear
criterion of BMI or average-weight as a meaningful outcome.

Of course, for BMI you need height ... which potentially could be
an indicator of either ethnic origin or proper nutrition.  I suppose
you might not have height if the people designing the study did not
pay attention to the analyses that they might want to *do*  once
they had their data. 

--
Rich Ulrich



> Date: Fri, 18 Jan 2013 13:16:43 -0800

> From: [hidden email]
> Subject: Re: non normal data: transform or non parametric test
> To: [hidden email]
>
> Hi all,
>
> Thanks for you reply:
>
> Mi dependent variable is weight for 1134 (sample size) childs between 5-7
> years
> explanatory variables are region (1-3) and sex.
> I would llike compare sex and 3 differents regions.
> for non parametric method i mean Mann-Whitney and Kolmogórov-Smirnov.
>
 ...
ro
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

ro
HI all

Thanks for your time and considerations, i will take in account your recomendations.

regards
Ro
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

Richard Ristow
In reply to this post by ro
At 01:52 PM 1/18/2013, ro wrote:

>I am working with biological data and have non normal data, the I
>wonder if [the best] choice [is to] transform data (log, sqr...) for
>a t-testt or [use a] non parametric test.

and added, 04:16 PM 1/18/2013:
>My dependent variable is weight for 1134 (sample size) childs
>between 5-7 years explanatory variables are region (1-3)  and sex.
>I would like to compare sex and 3 differents regions.

Your dependent is a variable with a limited range. The linear-models
tests, including the t-test and ANCOVA, tend not to be much affected
by non-normal residuals (and, as stated by others, not at all by
non-normal *variables*), except by extreme values, which you probably
won't see.

In recent years statisticians (notably John Tukey) have made strong
arguments against transforming data to make it 'look better', whether
'better' means normally distributed or anything else.

You're much best advised to do a straight ANOVA on your untransformed
data -- or, more likely, an ANCOVA with age as a covariate.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
ro
Reply | Threaded
Open this post in threaded view
|

Re: non normal data: transform or non parametric test

ro
Thanks I will focus at your points and not only at normal fit...

regards
Ro