Hi all;
I am working with biological data and have non normal data, the I wonder if choice transform data (log, sqr...) for a t-testt or choice non parametric test. thanks in advance Ro |
What you are calling "non parametric test" is probably the
equivalent of an ordinary parametric test performed on rank- transformed data. Is a rank-transformation, which is not reversible and is definitely not "normal" in the resulting distribution, a better idea than a power transformation for biological data? - Almost never. -- Rich Ulrich > Date: Fri, 18 Jan 2013 10:52:44 -0800 > From: [hidden email] > Subject: non normal data: transform or non parametric test > To: [hidden email] > > Hi all; > > I am working with biological data and have non normal data, the I wonder if > choice transform data (log, sqr...) for a t-testt or choice non parametric > test. > thanks in advance > Ro |
Administrator
|
In reply to this post by ro
Please provide more information. For starters:
1. What is the dependent variable? 2. What are the explanatory variables? 3. What is the null hypothesis? (We can probably guess this from the answers to 1 and 2, but what the heck!) 4. What is the sample size? 5. What are the conventions in your field for this type of analysis? Note that for general linear models (including linear regression, t-test, ANOVA), it is the errors that are assumed to be normally distributed, but the assumption of independence between fitted values and the errors is far more important than normality of the errors. And as George Box famously noted, normal distributions and straight lines don't exist in nature; nevertheless, the normal distribution and straight line can be useful as models/approximations of natural phenomena.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Hi all,
Thanks for you reply: Mi dependent variable is weight for 1134 (sample size) childs between 5-7 years explanatory variables are region (1-3) and sex. I would llike compare sex and 3 differents regions. for non parametric method i mean Mann-Whitney and Kolmogórov-Smirnov. Thank in advance Regards Ro |
Administrator
|
I would run an ANOVA or ANCOVA model and look at the residual plots. If the residual plots look good (meaning they show independence between fitted values & residuals), I'd stick with one of those models.
Here's an example using data from the General Social Survey data file that comes with SPSS. You'll have to modify variable names to make it fit your situation. new file. dataset close all. * Change path on the next line as necessary. GET FILE='C:\SPSSdata\1991 U.S. General Social Survey.sav'. * Two-way ANOVA model. UNIANOVA prestg80 BY sex region /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /SAVE=PRED (Pred1) RESID (Resid1) /EMMEANS=TABLES(sex) /EMMEANS=TABLES(region) /EMMEANS=TABLES(sex*region) /CRITERIA=ALPHA(0.05) /DESIGN=sex region sex*region. * Look at the residual plot. GRAPH /SCATTERPLOT(BIVAR)=Pred1 WITH Resid1 . * Two-way ANCOVA model, with Age as the covariate. UNIANOVA prestg80 BY sex region WITH age /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /SAVE=PRED (Pred2) RESID (Resid2) /EMMEANS=TABLES(sex) WITH(age=MEAN) /EMMEANS=TABLES(region) WITH(age=MEAN) /EMMEANS=TABLES(sex*region) WITH(age=MEAN) /CRITERIA=ALPHA(0.05) /DESIGN=age sex region sex*region. GRAPH /SCATTERPLOT(BIVAR)=Pred2 WITH Resid2 . For both of these models, the residuals appear to be both independent of the fitted values and homoscedastic. Those two assumptions -- the independent and identically distributed portions of i.i.d. N(0,sigma^2) -- are the most important ones, far more important than normality of the errors. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
p.s. - Did you see this thread?
http://spssx-discussion.1045642.n5.nabble.com/OT-t-tests-non-parametric-tests-and-large-studies-a-paradox-of-statistical-practice-td5717324.html
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by ro
MW is basically an ANOVA test after a rank-transformation, and
I already detailed enough drawbacks of that. In addition to what I mentioned, "ranks" do not do especially well with multiple variables in an analysis. Since they tend to introduce artifacts of interaction, you almost never find a multi-factor analysis of ranks. K-S is only for two groups at a time, with no covariates, so that doesn't give you much for these data. You certainly should have age-in-months as another covariate, for a human sample with ages from 5 to 7 years. BMI would probably be a more sensible contrast for nutrition ... which, combined with an unspecified set of "regions," brings up the question of whether you ought to look at "overweight" or "starving" as special categories that screw up a simple, linear criterion of BMI or average-weight as a meaningful outcome. Of course, for BMI you need height ... which potentially could be an indicator of either ethnic origin or proper nutrition. I suppose you might not have height if the people designing the study did not pay attention to the analyses that they might want to *do* once they had their data. -- Rich Ulrich > Date: Fri, 18 Jan 2013 13:16:43 -0800 > From: [hidden email] > Subject: Re: non normal data: transform or non parametric test > To: [hidden email] > > Hi all, > > Thanks for you reply: > > Mi dependent variable is weight for 1134 (sample size) childs between 5-7 > years > explanatory variables are region (1-3) and sex. > I would llike compare sex and 3 differents regions. > for non parametric method i mean Mann-Whitney and Kolmogórov-Smirnov. > |
HI all
Thanks for your time and considerations, i will take in account your recomendations. regards Ro |
In reply to this post by ro
At 01:52 PM 1/18/2013, ro wrote:
>I am working with biological data and have non normal data, the I >wonder if [the best] choice [is to] transform data (log, sqr...) for >a t-testt or [use a] non parametric test. and added, 04:16 PM 1/18/2013: >My dependent variable is weight for 1134 (sample size) childs >between 5-7 years explanatory variables are region (1-3) and sex. >I would like to compare sex and 3 differents regions. Your dependent is a variable with a limited range. The linear-models tests, including the t-test and ANCOVA, tend not to be much affected by non-normal residuals (and, as stated by others, not at all by non-normal *variables*), except by extreme values, which you probably won't see. In recent years statisticians (notably John Tukey) have made strong arguments against transforming data to make it 'look better', whether 'better' means normally distributed or anything else. You're much best advised to do a straight ANOVA on your untransformed data -- or, more likely, an ANCOVA with age as a covariate. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |