|
|
Anata Ionescu wrote:
> As I mentioned to you in a previous e-mail, I am not a statistician. > However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, > Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that > (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) > are indicators for the normality of the distribution. To be exact: > If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of > Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as > normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in > both cases). > > The problem is that I`m not quite sure this is a valid way of checking > for normality, because most distributions, however far from normality, > pass these tests. There is no consensus in the research community about how to examine the normality assumption, and any approach you take will have its problems. In particular, if you make the choice of what statistical test you use dependent on a test of the normality of the data, the statistical properties of that test statistic become far more complicated than you might expect. You could end up with a test, for example, that does not maintain its stated alpha level. That being said, many of us still look at issues like normality assumptions and make the choice of statistical test or transformation dependent on these results. Some people, including me, are very wary of examining normality assumptions from a hypothesis testing perspective, which is what you are advocating above. The problem is that these tests have far too little power for small sample sizes when the assumptions of normality are most critical, and far too much power for large sample sizes when the assumptions of normality are not really needed (because of the Central Limit Theorem). Rather than assess the statistical significance of a test of non-normality, you should, perhaps, consider a measure of the degree of non-normality. So a data set with skewness larger than +/-1 or kurtosis larger than 6 is an indication of a problem. By the way, some packages define kurtosis in such a way that it is 3 for a normal distribution and others define it in such a way that it is 0 for a normal distribution. The latter definition is evil, evil, evil. Don't tinker with the moments that Karl Pearson invented. It's a sacrilege. By the way, the fact that you see many distributions far from normal and the tests fail to detect this either means that your sample sizes are always very small, or perhaps you need to rethink what it means to be "far from normal." I hope this helps. -- Steve Simon, Standard Disclaimer The Monthly Mean is celebrating its first anniversary. Find out more about the newsletter that dares to call itself "average" at www.pmean.com/news ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Anata Ionescu
Art Anata Ionescu wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
One concern about normality I would see is when using an
parametric based effect-size--because departures from normality (not to mention
measurement error) would attenuate the estimated effect size. And a more robust
alternative ala Cliff’s d , P(X>Y) would be more
appropriate… John
Denbleyker Research, Evaluation & Testing (RET) District Achievement Analyst Anoka-Hennepin
ISD#11 763-506-1155
(office) 763-219-1905
(cell) From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of Art Kendall Why are you concerned about "normality"?
===================== To
manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD |
|
Hi,
Correct me if I am wrong, I am using PASW Statistics Analyze..Descriptives Statistics..Explore to check for normality. There is this Statistics Kolmogorov-Smirnov that allows you to check for normality using the following hypothesis. Ho: The distribution is Normal H1: the distribution is not Normal I do have people saying that you can literally "accept" Ho here just like Levene's Test for equality of variance. If there is no evidence of Non-normality, in a safer way, you might want to based on the checklist as follows to make sure that all of it passes. §Bell shaped curve?
§Is the distribution symmetrical? (Histogram)
§Check if MEAN ≈ MEDIAN ≈ MODE
§5% Trimmed mean similar to mean?
§Check if Skewness ≈ 0
§Check if Kurtosis ≈ 0
§Check for Outliers (Explore - Box-plots)
Once all these passes, you can assume that your distribution is normal. If you are using an ANOVA and your design is a balanced design, slight detrend from Normailty is alright. But if you are still very concern, you might want to normalise your variable anyway as most of the variables in real life are not normal. Do contribute or correct my thinking if anyone out there feels that things that I have suggested needs correction. Thanks. Regards Dorraj Date: Thu, 3 Dec 2009 11:08:16 -0600 From: [hidden email] Subject: Re: just curious To: [hidden email] One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more rrobust alternative ala Cliff’s d , P(X>Y) would be more appropriate…
John Denbleyker Research, Evaluation & Testing (RET) District Achievement Analyst Anoka-Hennepin ISD#11 763-506-1155 (office) 763-219-1905 (cell)
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Why are you concerned about "normality"?
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Windows 7: Simplify what you do everyday. Find the right PC for you. |
|
Dear Dorraj,
Tests about means make inferences about the sampling distributions of the means. As sample size grows large, these sampling distributions will follow Gaussian distributions, regardless of the distributions of original variables. So even if a variable is extremely nonnormally distributed, the (arithmetic) mean of this variable (over repeated samples rather than over observations) will be virtually perfectly normally distributed as long as the sample size is adequate. This phenomenon is known as the central limit theorem, see: http://en.wikipedia.org/wiki/Central_limit_theorem I guess it's the central limit theorem that makes ANOVA models robust against departures from normality if sample size is adequate. If sample size is not adequate, ANOVA will be less robust but -I'm getting to the point now- without adequate sample size, the Kolmogorov-Smirnov test will have low statistical power. The irony is basically that the smaller the sample size, the more important the normality of the distributions and the more difficult to demonstrate departures from normality with a statistical test like one-sample K-S. This really limits the usefulness of this test. Kind regards, Ruben van den Berg Date: Fri, 4 Dec 2009 02:50:41 +0000 From: [hidden email] Subject: Re: just curious To: [hidden email] Hi, Correct me if I am wrong, I am using PASW Statistics Analyze..Descriptives Statistics..Explore to check for normality. There is this Statistics Kolmogorov-Smirnov that allows you to check for normality using the following hypothesis. Ho: The distribution is Normal H1: the distribution is not Normal I do have people saying that you can literally "accept" Ho here just like Levene's Test for equality of variance. If there is no evidence of Non-normality, in a safer way, you might want to based on the checklist as follows to make sure that all of it passes. §Bell shaped curve?
§Is the distribution symmetrical? (Histogram)
§Check if MEAN ≈ MEDIAN ≈ MODE
§5% Trimmed mean similar to mean?
§Check if Skewness ≈ 0
§Check if Kurtosis ≈ 0
§Check for Outliers (Explore - Box-plots)
Once all these passes, you can assume that your distribution is normal. If you are using an ANOVA and your design is a balanced design, slight detrend from Normailty is alright. But if you are still very concern, you might want to normalise your variable anyway as most of the variables in real life are not normal. Do contribute or correct my thinking if anyone out there feels that things that I have suggested needs correction. Thanks. Regards Dorraj Date: Thu, 3 Dec 2009 11:08:16 -0600 From: [hidden email] Subject: Re: just curious To: [hidden email] One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more rrobust alternative ala Cliff’s d , P(X>Y) would be more appropriate…
John Denbleyker Research, Evaluation & Testing (RET) District Achievement Analyst Anoka-Hennepin ISD#11 763-506-1155 (office) 763-219-1905 (cell)
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Why are you concerned about "normality"?
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Windows 7: Simplify what you do everyday. Find the right PC for you. Express yourself instantly with MSN Messenger! MSN Messenger |
|
In reply to this post by Anata Ionescu
I need to have a normal distribution for all my questions in order for the questionnaire to be considered valid..Reading between the lines it seems you are trying to create a summative scale. Is this so? Where did you come across the idea that items in a summative scale needed to be normally distributed? How do you see that as related to validity? How many constructs you intend the questionnaire to measure? I.e., how many scales are you trying to create? What are the constructs? What is the response scale? 1 means... 2 means..., etc What is the purpose of developing the questionnaire? Is this a pilot test of a one time study? Is it to develop scales for general use in a subject matter area? Is this the whole study (e.g., a small class assignment?) P. S. Please post follow-ups to the list. This helps other participants know whether to take the time to respond, helps them add ideas so you get a fuller response, and helps people who go to the archives with similar questions. Art Kendall Social Research Consultants Anata Ionescu wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Ruben Geert van den Berg
Hi Ruben van den Berg,
I do know about this problem of sample size that will hinder Statistical testing. So can you share a way around this? So must we rely on the checklist that I have quoted? it will be quite tedious say if I happen to have 20 scale variables to check. Thanks and Best Regards Dorraj Date: Fri, 4 Dec 2009 11:13:40 +0000 From: [hidden email] Subject: Re: just curious To: [hidden email] Dear Dorraj, Tests about means make inferences about the sampling distributions of the means. As sample size grows large, these sampling distributions will follow Gaussian distributions, regardless of the distributions of original variables. So even if a variable is extremely nonnormally distributed, the (arithmetic) mean of this variable (over repeated samples rather than over observations) will be virtually perfectly normally distributed as long as the sample size is adequate. This phenomenon is known as the central limit theorem, see: http://en.wikipedia.org/wiki/Central_limit_theorem I guess it's the central limit theorem that makes ANOVA models robust against departures from normality if sample size is adequate. If sample size is not adequate, ANOVA will be less robust but -I'm getting to the point now- without adequate sample size, the Kolmogorov-Smirnov test will have low statistical power. The irony is basically that the smaller the sample size, the more important the normality of the distributions and the more difficult to demonstrate departures from normality with a statistical test like one-sample K-S. This really limits the usefulness of this test. Kind regards, Ruben van den Berg Date: Fri, 4 Dec 2009 02:50:41 +0000 From: [hidden email] Subject: Re: just curious To: [hidden email] Hi, Correct me if I am wrong, I am using PASW Statistics Analyze..Descriptives Statistics..Explore to check for normality. There is this Statistics Kolmogorov-Smirnov that allows you to check for normality using the following hypothesis. Ho: The distribution is Normal H1: the distribution is not Normal I do have people saying that you can literally "accept" Ho here just like Levene's Test for equality of variance. If there is no evidence of Non-normality, in a safer way, you might want to based on the checklist as follows to make sure that all of it passes. §Bell shaped curve?
§Is the distribution symmetrical? (Histogram)
§Check if MEAN ≈ MEDIAN ≈ MODE
§5% Trimmed mean similar to mean?
§Check if Skewness ≈ 0
§Check if Kurtosis ≈ 0
§Check for Outliers (Explore - Box-plots)
Once all these passes, you can assume that your distribution is normal. If you are using an ANOVA and your design is a balanced design, slight detrend from Normailty is alright. But if you are still very concern, you might want to normalise your variable anyway as most of the variables in real life are not normal. Do contribute or correct my thinking if anyone out there feels that things that I have suggested needs correction. Thanks. Regards Dorraj Date: Thu, 3 Dec 2009 11:08:16 -0600 From: [hidden email] Subject: Re: just curious To: [hidden email] One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more rrobust alternative ala Cliff’s d , P(X>Y) would be more appropriate…
John Denbleyker Research, Evaluation & Testing (RET) District Achievement Analyst Anoka-Hennepin ISD#11 763-506-1155 (office) 763-219-1905 (cell)
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Why are you concerned about "normality"?
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Windows 7: Simplify what you do everyday. Find the right PC for you. Express yourself instantly with MSN Messenger! MSN Messenger Windows 7: Simplify what you do everyday. Find the right PC for you. |
|
In reply to this post by Ruben Geert van den Berg
Hi all,
I have the following time data in a string filed:
701
I can use the date/time wizard to convert to a date/time field but the result is:
701:00
So my thought is that i need to get the colon in there which lead me to trying the RPAD function where i think i'm saying "look at the field, go 2 places to the right, and add a colon". However the following is returning a blank field.
compute NewTime = rpad(start_time,2,':'). execute.
1) Am i correct in why the date/time wizard is not correctly formatting the data 2) What is incorrect about my use of rpad?
Thanks all - happy 2010.
Carol
|
|
I don't think RPAD is what you're looking for. What are the times supposed to be? 7:01, 7:10, and 16:07? If these are 24 hour clock times, and we can assume there is no value longer shorter than 3 digits (minutes less than 10 always preceded by 0) and no value greater than 4 digits: data list free /timevar (a4). begin data 701 710 1607 end data. string #temp (a5). compute #temp=timevar. do if char.length(timevar)=3. compute #temp=concat(substr(timevar, 1,1), ":", substr(timevar,2)). else if char.length(timevar)=4. compute #temp=concat(substr(timevar, 1,2), ":", substr(timevar,3)). end if. compute newtime=numeric(#temp, time5). formats newtime (time5). list. This is perhaps an inelegant solution.
Hi all, I have the following time data in a string filed: 701 710 1607 etc I can use the date/time wizard to convert to a date/time field but the result is: 701:00 710:00 1607:00 etc So my thought is that i need to get the colon in there which lead me to trying the RPAD function where i think i'm saying "look at the field, go 2 places to the right, and add a colon". However the following is returning a blank field. compute NewTime = rpad(start_time,2,':'). execute. 1) Am i correct in why the date/time wizard is not correctly formatting the data 2) What is incorrect about my use of rpad? Thanks all - happy 2010. Carol |
| Free forum by Nabble | Edit this page |
