just curious

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

just curious

Anata Ionescu
Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

Reply | Threaded
Open this post in threaded view
|

Re: just curious

Steve Simon, P.Mean Consulting
Anata Ionescu wrote:

> As I mentioned to you in a previous e-mail, I am not a statistician.
> However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P,
> Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that
> (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis)
> are indicators for the normality of the distribution. To be exact:
> If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of
> Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as
> normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in
> both cases).
 >
> The problem is that I`m not quite sure this is a valid way of checking
> for normality, because most distributions, however far from normality,
> pass these tests.

There is no consensus in the research community about how to examine the
normality assumption, and any approach you take will have its problems.
In particular, if you make the choice of what statistical test you use
dependent on a test of the normality of the data, the statistical
properties of that test statistic become far more complicated than you
might expect. You could end up with a test, for example, that does not
maintain its stated alpha level.

That being said, many of us still look at issues like normality
assumptions and make the choice of statistical test or transformation
dependent on these results.

Some people, including me, are very wary of examining normality
assumptions from a hypothesis testing perspective, which is what you are
advocating above. The problem is that these tests have far too little
power for small sample sizes when the assumptions of normality are most
critical, and far too much power for large sample sizes when the
assumptions of normality are not really needed (because of the Central
Limit Theorem).

Rather than assess the statistical significance of a test of
non-normality, you should, perhaps, consider a measure of the degree of
non-normality. So a data set with skewness larger than +/-1 or kurtosis
larger than 6 is an indication of a problem. By the way, some packages
define kurtosis in such a way that it is 3 for a normal distribution and
others define it in such a way that it is 0 for a normal distribution.
The latter definition is evil, evil, evil. Don't tinker with the moments
that Karl Pearson invented. It's a sacrilege.

By the way, the fact that you see many distributions far from normal and
the tests fail to detect this either means that your sample sizes are
always very small, or perhaps you need to rethink what it means to be
"far from normal."

I hope this helps.
--
Steve Simon, Standard Disclaimer
The Monthly Mean is celebrating its first anniversary.
Find out more about the newsletter that dares
to call itself "average" at www.pmean.com/news

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: just curious

Art Kendall
In reply to this post by Anata Ionescu
Why are you concerned about "normality"?

Art

Anata Ionescu wrote:
Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: just curious

Denbleyker, John

One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more robust alternative ala Cliff’s d , P(X>Y) would be more appropriate…

 

 

 

John Denbleyker

Research, Evaluation & Testing (RET)

District Achievement Analyst

Anoka-Hennepin ISD#11

[hidden email]

763-506-1155 (office)

763-219-1905 (cell)

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Thursday, December 03, 2009 10:24 AM
To: [hidden email]
Subject: Re: just curious

 

Why are you concerned about "normality"?

Art

Anata Ionescu wrote:

Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: just curious

Jarrod Teo-2
Hi,
 
Correct me if I am wrong, I am using PASW Statistics Analyze..Descriptives Statistics..Explore to check for normality. There is this Statistics Kolmogorov-Smirnov that allows you to check for normality using the following hypothesis.
 
Ho: The distribution is Normal
H1: the distribution is not Normal
 
I do have people saying that you can literally "accept" Ho here just like Levene's Test for equality of variance.
 
If there is no evidence of Non-normality, in a safer way, you might want to based on the checklist as follows to make sure that all of it passes.
 
§Bell shaped curve?
§Is the distribution symmetrical? (Histogram)
§Check if  MEAN     MEDIAN    MODE
§5% Trimmed mean similar to mean?
§Check if Skewness ≈ 0
§Check if Kurtosis ≈ 0
§Check for Outliers (Explore - Box-plots)
 
Once all these passes, you can assume that your distribution is normal.

If you are using an ANOVA and your design is a balanced design, slight detrend from Normailty is alright. But if you are still very concern, you might want to normalise your variable anyway as most of the variables in real life are not normal.
 
Do contribute or correct my thinking if anyone out there feels that things that I have suggested needs correction. Thanks.
 
Regards
Dorraj
 

Date: Thu, 3 Dec 2009 11:08:16 -0600
From: [hidden email]
Subject: Re: just curious
To: [hidden email]

One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more rrobust alternative ala Cliff’s d , P(X>Y) would be more appropriate…

 

 

 

John Denbleyker

Research, Evaluation & Testing (RET)

District Achievement Analyst

Anoka-Hennepin ISD#11

[hidden email]

763-506-1155 (office)

763-219-1905 (cell)

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Thursday, December 03, 2009 10:24 AM
To: [hidden email]
Subject: Re: just curious

 

Why are you concerned about "normality"?

Art

Anata Ionescu wrote:

Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



Windows 7: Simplify what you do everyday. Find the right PC for you.
Reply | Threaded
Open this post in threaded view
|

Re: just curious

Ruben Geert van den Berg
Dear Dorraj,
 
Tests about means make inferences about the sampling distributions of the means. As sample size grows large, these sampling distributions will follow Gaussian distributions, regardless of the distributions of original variables. So even if a variable is extremely nonnormally distributed, the (arithmetic) mean of this variable (over repeated samples rather than over observations) will be virtually perfectly normally distributed as long as the sample size is adequate. This phenomenon is known as the central limit theorem, see:
 
http://en.wikipedia.org/wiki/Central_limit_theorem

I guess it's the central limit theorem that makes ANOVA models robust against departures from normality if sample size is adequate. If sample size is not adequate, ANOVA will be less robust but -I'm getting to the point now- without adequate sample size, the Kolmogorov-Smirnov test will have low statistical power. The irony is basically that the smaller the sample size, the more important the normality of the distributions and the more difficult to demonstrate departures from normality with a statistical test like one-sample K-S. This really limits the usefulness of this test.
 
Kind regards,
 
Ruben van den Berg

 



 

Date: Fri, 4 Dec 2009 02:50:41 +0000
From: [hidden email]
Subject: Re: just curious
To: [hidden email]

Hi,
 
Correct me if I am wrong, I am using PASW Statistics Analyze..Descriptives Statistics..Explore to check for normality. There is this Statistics Kolmogorov-Smirnov that allows you to check for normality using the following hypothesis.
 
Ho: The distribution is Normal
H1: the distribution is not Normal
 
I do have people saying that you can literally "accept" Ho here just like Levene's Test for equality of variance.
 
If there is no evidence of Non-normality, in a safer way, you might want to based on the checklist as follows to make sure that all of it passes.
 
§Bell shaped curve?
§Is the distribution symmetrical? (Histogram)
§Check if  MEAN     MEDIAN    MODE
§5% Trimmed mean similar to mean?
§Check if Skewness ≈ 0
§Check if Kurtosis ≈ 0
§Check for Outliers (Explore - Box-plots)
 
Once all these passes, you can assume that your distribution is normal.

If you are using an ANOVA and your design is a balanced design, slight detrend from Normailty is alright. But if you are still very concern, you might want to normalise your variable anyway as most of the variables in real life are not normal.
 
Do contribute or correct my thinking if anyone out there feels that things that I have suggested needs correction. Thanks.
 
Regards
Dorraj
 

Date: Thu, 3 Dec 2009 11:08:16 -0600
From: [hidden email]
Subject: Re: just curious
To: [hidden email]

One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more rrobust alternative ala Cliff’s d , P(X>Y) would be more appropriate…

 

 

 

John Denbleyker

Research, Evaluation & Testing (RET)

District Achievement Analyst

Anoka-Hennepin ISD#11

[hidden email]

763-506-1155 (office)

763-219-1905 (cell)

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Thursday, December 03, 2009 10:24 AM
To: [hidden email]
Subject: Re: just curious

 

Why are you concerned about "normality"?

Art

Anata Ionescu wrote:

Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



Windows 7: Simplify what you do everyday. Find the right PC for you.

Express yourself instantly with MSN Messenger! MSN Messenger
Reply | Threaded
Open this post in threaded view
|

Re: just curious

Art Kendall
In reply to this post by Anata Ionescu
I need to have a normal distribution for all my questions in order for the questionnaire to be considered valid..
Reading between the lines it seems you are trying to create a summative scale.  Is this so?
Where did you come across the idea that items in a summative scale needed to be normally distributed?  How do you see that as related to validity?

How many constructs you intend the questionnaire to measure?  I.e., how many scales are you trying to create? What are the constructs?  What is the response scale? 1 means...  2 means..., etc

What is the purpose of developing the questionnaire?  Is this a pilot test of a one time study?  Is it to develop scales for general use in a subject matter area?
Is this the whole study (e.g., a small class assignment?)

P. S.  Please post follow-ups to the list.  This helps other participants know whether to take the time to respond, helps them add ideas so you get a fuller response, and helps people who go to the archives with similar questions.

Art Kendall
Social Research Consultants
Anata Ionescu wrote:
I made a questionnaire of my own and I need to have a normal distribution for all my questions in order for the questionnaire to be considered valid...
P.S. There are 39 questions with 5 choices each, which means 1 to 5 points depending on the answer, and I have a number of 33 participants. So, for example, for question 1 I have:
3, 1, 5, 3, 5, 4, 4, 4, 4, 2, 5, 5, 3, 1, 5, 4, 5, 4, 4, 1, 2, 5, 4, 5, 3, 4, 4, 4, 5, 4, 2, 3, 1.
That means the frequencies are: 4 for 1, 3 for 2, 5 for 3, 12 for 4 and 9 for 5.
How far from normal is that?

Thank you.

--- On Thu, 12/3/09, Art Kendall [hidden email] wrote:

From: Art Kendall [hidden email]
Subject: Re: just curious
To: [hidden email]
Date: Thursday, December 3, 2009, 6:24 PM

Why are you concerned about "normality"?

Art

Anata Ionescu wrote:
Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: just curious

Jarrod Teo-2
In reply to this post by Ruben Geert van den Berg
Hi Ruben van den Berg,
 
I do know about this problem of sample size that will hinder Statistical testing. So can you share a way around this?
 
So must we rely on the checklist that I have quoted? it will be quite tedious say if I happen to have 20 scale variables to check.
 
Thanks and Best Regards
Dorraj
 

Date: Fri, 4 Dec 2009 11:13:40 +0000
From: [hidden email]
Subject: Re: just curious
To: [hidden email]

Dear Dorraj,
 
Tests about means make inferences about the sampling distributions of the means. As sample size grows large, these sampling distributions will follow Gaussian distributions, regardless of the distributions of original variables. So even if a variable is extremely nonnormally distributed, the (arithmetic) mean of this variable (over repeated samples rather than over observations) will be virtually perfectly normally distributed as long as the sample size is adequate. This phenomenon is known as the central limit theorem, see:
 
http://en.wikipedia.org/wiki/Central_limit_theorem

I guess it's the central limit theorem that makes ANOVA models robust against departures from normality if sample size is adequate. If sample size is not adequate, ANOVA will be less robust but -I'm getting to the point now- without adequate sample size, the Kolmogorov-Smirnov test will have low statistical power. The irony is basically that the smaller the sample size, the more important the normality of the distributions and the more difficult to demonstrate departures from normality with a statistical test like one-sample K-S. This really limits the usefulness of this test.
 
Kind regards,
 
Ruben van den Berg

 



 

Date: Fri, 4 Dec 2009 02:50:41 +0000
From: [hidden email]
Subject: Re: just curious
To: [hidden email]

Hi,
 
Correct me if I am wrong, I am using PASW Statistics Analyze..Descriptives Statistics..Explore to check for normality. There is this Statistics Kolmogorov-Smirnov that allows you to check for normality using the following hypothesis.
 
Ho: The distribution is Normal
H1: the distribution is not Normal
 
I do have people saying that you can literally "accept" Ho here just like Levene's Test for equality of variance.
 
If there is no evidence of Non-normality, in a safer way, you might want to based on the checklist as follows to make sure that all of it passes.
 
§Bell shaped curve?
§Is the distribution symmetrical? (Histogram)
§Check if  MEAN     MEDIAN    MODE
§5% Trimmed mean similar to mean?
§Check if Skewness ≈ 0
§Check if Kurtosis ≈ 0
§Check for Outliers (Explore - Box-plots)
 
Once all these passes, you can assume that your distribution is normal.

If you are using an ANOVA and your design is a balanced design, slight detrend from Normailty is alright. But if you are still very concern, you might want to normalise your variable anyway as most of the variables in real life are not normal.
 
Do contribute or correct my thinking if anyone out there feels that things that I have suggested needs correction. Thanks.
 
Regards
Dorraj
 

Date: Thu, 3 Dec 2009 11:08:16 -0600
From: [hidden email]
Subject: Re: just curious
To: [hidden email]

One concern about normality I would see is when using an parametric based effect-size--because departures from normality (not to mention measurement error) would attenuate the estimated effect size. And a more rrobust alternative ala Cliff’s d , P(X>Y) would be more appropriate…

 

 

 

John Denbleyker

Research, Evaluation & Testing (RET)

District Achievement Analyst

Anoka-Hennepin ISD#11

[hidden email]

763-506-1155 (office)

763-219-1905 (cell)

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Thursday, December 03, 2009 10:24 AM
To: [hidden email]
Subject: Re: just curious

 

Why are you concerned about "normality"?

Art

Anata Ionescu wrote:

Greetings.

As I mentioned to you in a previous e-mail, I am not a statistician. However, lately I`ve had to use SPSS sporadically. I`ve heard about P-P, Q-Q, Shapiro-Wilk and things like that but, innitially, I was told that (Skewness/Std. Error of Skewness) and (Kurosis/Std. Error of Kurtosis) are indicators for the normality of the distribution. To be exact:
If (Skewness/Std. Error of Skewness)<1.96 and (Kurosis/Std. Error of Kurtosis)<1.96 (for confidence 99%) we can treat the distribution as normal (for confidence 95%, instead of 1.96, we must have 2.54(?) in both cases).
The problem is that I`m not quite sure this is a valid way of checking for normality, because most distributions, however far from normality, pass these tests.

Thanks in advance!

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



Windows 7: Simplify what you do everyday. Find the right PC for you.

Express yourself instantly with MSN Messenger! MSN Messenger

Windows 7: Simplify what you do everyday. Find the right PC for you.
Reply | Threaded
Open this post in threaded view
|

use of rpad

parisec
In reply to this post by Ruben Geert van den Berg

Hi all,

 

I have the following time data in a string filed:

 

        701    
        710    
       1607     
      etc

 

I can use the date/time wizard to convert to a date/time field but the result is:

 

 701:00    
 710:00 
 1607:00     
etc

 

So my thought is that i need to get the colon in there which lead me to trying the RPAD function where i think i'm saying "look at the field,  go 2 places to the right, and add a colon". However the following is returning a blank field.

 

compute  NewTime  = rpad(start_time,2,':').

execute. 

 

1) Am i correct in why the date/time wizard is not correctly formatting the data

2) What is incorrect about my use of rpad?

 

Thanks all - happy 2010.

 

Carol

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: use of rpad

Rick Oliver-3

I don't think RPAD is what you're looking for. What are the times supposed to be? 7:01, 7:10, and 16:07?

If these are 24 hour clock times, and we can assume there is no value longer shorter than 3 digits (minutes less than 10 always preceded by 0) and no value greater than 4 digits:

data list free /timevar (a4).
begin data
701 710 1607
end data.
string #temp (a5).
compute #temp=timevar.
do if char.length(timevar)=3.
compute #temp=concat(substr(timevar, 1,1), ":", substr(timevar,2)).
else if char.length(timevar)=4.
compute #temp=concat(substr(timevar, 1,2), ":", substr(timevar,3)).
end if.
compute newtime=numeric(#temp, time5).
formats newtime (time5).
list.

This is perhaps an inelegant solution.




From: "Parise, Carol A." <[hidden email]>
To: [hidden email]
Date: 01/04/2010 12:11 PM
Subject: use of rpad
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Hi all,
 
I have the following time data in a string filed:
 
        701    
       710    
      1607    
     etc

 
I can use the date/time wizard to convert to a date/time field but the result is:
 
 701:00    
710:00
1607:00    
etc

 
So my thought is that i need to get the colon in there which lead me to trying the RPAD function where i think i'm saying "look at the field,  go 2 places to the right, and add a colon". However the following is returning a blank field.
 
compute  NewTime  = rpad(start_time,2,':').
execute.
 
1) Am i correct in why the date/time wizard is not correctly formatting the data
2) What is incorrect about my use of rpad?
 
Thanks all - happy 2010.
 
Carol