Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Faiz Rasool
Hi All,
 
I have a variable which is a sum of 10 Likert items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The sample size is 150. I'm trying to determine whether my variable is normally distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check for the normality of the variable. The value of K-S test was .104 (sig=.000), and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the sig value of these tests is above .05,  then it means that data is normally distributed. Value under 0.5 means that the data is not normally distributed. From what I can decide the S-W test is saying that my variable is normal, and the K-S test is suggesting that it is not normal. I have tried to transform the data, but the test statistics have not really changed. Below I mention the mean, Standard deviation, Skewness, Kurtosis and test statistics obtained before and after transformation. The problem has not been solved by transformations. K-S test continues to suggest that data is not normal, while S-W test disagrees.
 
Statistics without any transformation:
 
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis = -.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
 
Statistics after log transformation:
 
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis = -.059. K-S statistic = .106 (sig=.000). S-W statistic = .976 (sig=.003).
 
When I used the SQRT function to transform the data, I obtained same test statistics as I did before using log transformation.  I have mentioned   those test statistics in the first paragraph of this email. In such a situation where one test appears to suggest that variable is normally distributed and the other test does not provide the same conclusion, what are my options. I cannot rely on graphs, due to my visual impairment, and statistical tests are an only option for me to establish whether my data is normal or not. Any suggestion and guidance is most appreciated.
 
Thanks all,
 
Faiz. 
 
Reply | Threaded
Open this post in threaded view
|

Re: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Zdaniuk, Bozena-3
Faiz, you wrote that S-W test significance was .007 which is way UNDER than .05, indicating that the data are NOT normally distributed.
Am i missing something here?
bozena zdaniuk
----- Original Message -----
From: "Faiz Rasool" <[hidden email]>
To: [hidden email]
Sent: Friday, February 4, 2011 1:42:06 PM GMT -08:00 US/Canada Pacific
Subject: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Hi All,
 
I have a variable which is a sum of 10 Likert items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The sample size is 150. I'm trying to determine whether my variable is normally distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check for the normality of the variable. The value of K-S test was .104 (sig=.000), and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the sig value of these tests is above .05,  then it means that data is normally distributed. Value under 0.5 means that the data is not normally distributed. From what I can decide the S-W test is saying that my variable is normal, and the K-S test is suggesting that it is not normal. I have tried to transform the data, but the test statistics have not really changed. Below I mention the mean, Standard deviation, Skewness, Kurtosis and test statistics obtained before and after transformation. The problem has not been solved by transformations. K-S test continues to suggest that data is not normal, while S-W test disagrees.
 
Statistics without any transformation:
 
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis = -.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
 
Statistics after log transformation:
 
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis = -.059. K-S statistic = .106 (sig=.000). S-W statistic = .976 (sig=.003).
 
When I used the SQRT function to transform the data, I obtained same test statistics as I did before using log transformation.  I have mentioned   those test statistics in the first paragraph of this email. In such a situation where one test appears to suggest that variable is normally distributed and the other test does not provide the same conclusion, what are my options. I cannot rely on graphs, due to my visual impairment, and statistical tests are an only option for me to establish whether my data is normal or not. Any suggestion and guidance is most appreciated.
 
Thanks all,
 
Faiz. 
 
Reply | Threaded
Open this post in threaded view
|

Re: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Art Kendall
In reply to this post by Faiz Rasool
Why do you care whether your whether your raw variable is normally distributed or not?  How are you going to use it?  Most often one is concerned with whether the residuals are very far from normally distributed rather than whether the raw variable is very far from normal.

Also if this is a relatively new summative scale you may want to see how well the items go into making up a scale.  see RELIABILITY.

If this is the unusual situation where much departure from normality of the raw variable might be important, have you looked at it via something like EXPLORE to see if the histogram show a very discrepant picture?

Art Kendall
Social Research Consultants

On 2/4/2011 4:42 PM, Faiz Rasool wrote:
Hi All,
 
I have a variable which is a sum of 10 Likert items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The sample size is 150. I'm trying to determine whether my variable is normally distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check for the normality of the variable. The value of K-S test was .104 (sig=.000), and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the sig value of these tests is above .05,  then it means that data is normally distributed. Value under 0.5 means that the data is not normally distributed. >From what I can decide the S-W test is saying that my variable is normal, and the K-S test is suggesting that it is not normal. I have tried to transform the data, but the test statistics have not really changed. Below I mention the mean, Standard deviation, Skewness, Kurtosis and test statistics obtained before and after transformation. The problem has not been solved by transformations. K-S test continues to suggest that data is not normal, while S-W test disagrees.
 
Statistics without any transformation:
 
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis = -.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
 
Statistics after log transformation:
 
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis = -.059. K-S statistic = .106 (sig=.000). S-W statistic = .976 (sig=.003).
 
When I used the SQRT function to transform the data, I obtained same test statistics as I did before using log transformation.  I have mentioned   those test statistics in the first paragraph of this email. In such a situation where one test appears to suggest that variable is normally distributed and the other test does not provide the same conclusion, what are my options. I cannot rely on graphs, due to my visual impairment, and statistical tests are an only option for me to establish whether my data is normal or not. Any suggestion and guidance is most appreciated.
 
Thanks all,
 
Faiz. 
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Art Kendall
Normality of a DV is a question that often comes up.  There have been many posts to this list and others.  In brief, in multiple regression one assumes that the residuals are not clearly different from normally distributed.

I am not where my books are so cannot give a more precise citation but an excellent book is titled something like
Cohen (2003) Applied Multiple Regression and Correlation.

I know that many sources say that normality of the DV is important.  But consider the simplest regression model a continuous DV and a dichotomous predictor. One would hope that the DV by itself were definitely bimodal.

It would still be a good idea to run RELIABILITY.

Of course, at the very beginning you should start you data quality assurance by clicking <data> <identify duplicate cases> and <identify unusual cases>.

Why don't you email me the output file and a data file with a unique case identifier and the DV and I'll look at the visualizations.

Also, since it often helps the thoroughness of response if different list participants react to posts, and since it helps to develop the archives, please hold most of the conversations on the list unless someone suggests going offline.

Art Kendall
Social Research Consultants

On 2/5/2011 3:03 AM, Faiz Rasool wrote:
 
----- Original Message -----
Sent: Saturday, February 05, 2011 3:19 AM
Subject: Re: [SPSSX-L] Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Why do you care whether your whether your raw variable is normally distributed or not?  How are you going to use it?  Most often one is concerned with whether the residuals are very far from normally distributed rather than whether the raw variable is very far from normal.
I'm trying to check for normality of the distribution because most statistics books I read state that parametric tests require that a DV be normally distributed. I intend to use this variable as a DV in Multiple regression analysis.

Also if this is a relatively new summative scale you may want to see how well the items go into making up a scale.  see RELIABILITY.
 
The scale is of water conservation behavior. It includes items that have been used previously in the literature. Participents were presented a list of different water conservation actions, and they were asked to indicate how often or rarely they performed a given behavior.  

If this is the unusual situation where much departure from normality of the raw variable might be important, have you looked at it via something like EXPLORE to see if the histogram show a very discrepant picture?
I cannot see histogram for myself due to visual impairment. That was the major reason I used K-S and S-W tests. If you allow me I can email the output of SPSS to you as an email attachment. Perhaps you can help me in deciding what the histogram tells  about the distribution of data.
Regards,
Faiz.

Art Kendall
Social Research Consultants

On 2/4/2011 4:42 PM, Faiz Rasool wrote:
Hi All,
 
I have a variable which is a sum of 10 Likert items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The sample size is 150. I'm trying to determine whether my variable is normally distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check for the normality of the variable. The value of K-S test was .104 (sig=.000), and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the sig value of these tests is above .05,  then it means that data is normally distributed. Value under 0.5 means that the data is not normally distributed. >From what I can decide the S-W test is saying that my variable is normal, and the K-S test is suggesting that it is not normal. I have tried to transform the data, but the test statistics have not really changed. Below I mention the mean, Standard deviation, Skewness, Kurtosis and test statistics obtained before and after transformation. The problem has not been solved by transformations. K-S test continues to suggest that data is not normal, while S-W test disagrees.
 
Statistics without any transformation:
 
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis = -.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
 
Statistics after log transformation:
 
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis = -.059. K-S statistic = .106 (sig=.000). S-W statistic = .976 (sig=.003).
 
When I used the SQRT function to transform the data, I obtained same test statistics as I did before using log transformation.  I have mentioned   those test statistics in the first paragraph of this email. In such a situation where one test appears to suggest that variable is normally distributed and the other test does not provide the same conclusion, what are my options. I cannot rely on graphs, due to my visual impairment, and statistical tests are an only option for me to establish whether my data is normal or not. Any suggestion and guidance is most appreciated.
 
Thanks all,
 
Faiz. 
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Bruce Weaver
Administrator
To add to what Art said, note the distinction between "errors" and "residuals".  This is an important distinction that is often overlooked.  The Wikipedia page does a good job of explaining it.

   http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics

So the usual i.i.d. N(0,sigma-squared) assumption for OLS linear regression applies to the unobservable errors; but we assess it using the observable residuals, because they're all we have.

HTH.

Art Kendall wrote

 
   
 
 
    Normality of a DV is a question that often comes
      up.  There have been many posts to this list and others.  In
      brief, in multiple regression one assumes that the residuals are
      not clearly different from normally distributed.
     
      I am not where my books are so cannot give a more precise citation
      but an excellent book is titled something like
      Cohen (2003) Applied Multiple Regression and Correlation.
     
      I know that many sources say that normality of the DV is
      important.  But consider the simplest regression model a
      continuous DV and a dichotomous predictor. One would hope that the
      DV by itself were definitely bimodal.
     
      It would still be a good idea to run RELIABILITY.
     
      Of course, at the very beginning you should start you data quality
      assurance by clicking <data> <identify duplicate
      cases> and <identify unusual cases>.
     
      Why don't you email me the output file and a data file with a
      unique case identifier and the DV and I'll look at the
      visualizations.
     
      Also, since it often helps the thoroughness of response if
      different list participants react to posts, and since it helps to
      develop the archives, please hold most of the conversations on the
      list unless someone suggests going offline.
     
      Art Kendall
      Social Research Consultants
   
    On 2/5/2011 3:03 AM, Faiz Rasool wrote:
   
     
     
       
     
        ----- Original Message -----
        From: Art Kendall  
        To: Faiz Rasool  
        Cc: SPSSX-L@LISTSERV.UGA.EDU
       
        Sent: Saturday, February
          05, 2011 3:19 AM
        Subject: Re: [SPSSX-L]
          Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk
          tests when testing for normality of a variable.
       
       
        Why do you care whether your whether
          your raw variable is normally distributed or not?  How are you
          going to use it?  Most often one is concerned with whether the
          residuals are very far from normally distributed rather than
          whether the raw variable is very far from normal.
       
     
     
        I'm trying to check for
            normality of the distribution because most statistics books
            I read state that parametric tests require that a DV be
            normally distributed. I intend to use this variable as a DV
            in Multiple regression analysis.
     
     
       
          Also if this is a relatively new summative scale you may want
          to see how well the items go into making up a scale.  see
          RELIABILITY.
         
        The scale is of water
            conservation behavior. It includes items that have been used
            previously in the literature. Participents were presented a
            list of different water conservation actions, and they were
            asked to indicate how often or rarely they performed a given
            behavior.  
         
          If this is the unusual situation where much departure from
          normality of the raw variable might be important, have you
          looked at it via something like EXPLORE to see if the
          histogram show a very discrepant picture?
       
     
     
        I cannot see histogram for
            myself due to visual impairment. That was the major reason I
            used K-S and S-W tests. If you allow me I can email the
            output of SPSS to you as an email attachment. Perhaps you
            can help me in deciding what the histogram tells  about the
            distribution of data.
     
     
        Regards,
        Faiz.
       
          Art Kendall
          Social Research Consultants
         
          On 2/4/2011 4:42 PM, Faiz Rasool wrote:
       
         
         
          Hi All,
           
          I have a variable which is a
              sum of 10 Likert items. Each item had 4 responses
              (1=never, 2=sometimes, 3= often, 4=always). The sample
              size is 150. I'm trying to determine whether my variable
              is normally distributed or not. I used the
              Kolmogorov-Smirnov and Shapiro-Wilk test to check for the
              normality of the variable. The value of K-S test was .104
              (sig=.000), and value of S-W test was .975 (sig=.007).
              Field (2005), suggested that if the sig value of these
              tests is above .05,  then it means that data is normally
              distributed. Value under 0.5 means that the data is not
              normally distributed. >From what I can decide the S-W
              test is saying that my variable is normal, and the K-S
              test is suggesting that it is not normal. I have tried to
              transform the data, but the test statistics have not
              really changed. Below I mention the mean, Standard
              deviation, Skewness, Kurtosis and test statistics obtained
              before and after transformation. The problem has not been
              solved by transformations. K-S test continues to suggest
              that data is not normal, while S-W test disagrees.
           
          Statistics without any
              transformation:
           
          Mean = 26.05 SD = 3.369
              Skewness = .117 Kurtosis = -.219. K-S statistic = .104
              (sig=.000). S-W statistic = .975 (sig=.007).
           
          Statistics after log
              transformation:
           
          mean = 1.41 SD = .057
              Skewness = -.235 Kurtosis = -.059. K-S statistic = .106
              (sig=.000). S-W statistic = .976 (sig=.003).
           
          When I used the SQRT function
              to transform the data, I obtained same test statistics as
              I did before using log transformation.  I have mentioned  
              those test statistics in the first paragraph of this
              email. In such a situation where one test appears to
              suggest that variable is normally distributed and the
              other test does not provide the same conclusion, what are
              my options. I cannot rely on graphs, due to my visual
              impairment, and statistical tests are an only option for
              me to establish whether my data is normal or not. Any
              suggestion and guidance is most appreciated.
           
          Thanks all,
           
          Faiz. 
           
       
     
   
 


=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Faiz Rasool
In reply to this post by Art Kendall
I didn't realized that the list is configured as such that replies to emails go directly to the sender, and not to the list. My apologies.
 
 I have run the reliability analysis on the items that I have used  to measure variables. For couple of variables Cronbach statistic is .49, and if item is deleted it can go up to .52.  For other variables it is .59, and .80. Here is a question for all the list members: when we should really consider deleting an item from our scale to increase its Cronbach alpha? If we our studying conservation behaviors for example, and we drop a conservation related item from our scale, are we not  ignoring the fact that a particular behavior is unusual in our study participants, hence a low cronbach statistics for our scale.  So are there guidelines on what  factors should we consider when deciding to retain an item in our scale even though it may contribute in low Cronbach alpha.
 
Thanks, Faiz.
----- Original Message -----
Sent: Saturday, February 05, 2011 5:51 PM
Subject: Re: [SPSSX-L] Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Normality of a DV is a question that often comes up.  There have been many posts to this list and others.  In brief, in multiple regression one assumes that the residuals are not clearly different from normally distributed.

I am not where my books are so cannot give a more precise citation but an excellent book is titled something like
Cohen (2003) Applied Multiple Regression and Correlation.

I know that many sources say that normality of the DV is important.  But consider the simplest regression model a continuous DV and a dichotomous predictor. One would hope that the DV by itself were definitely bimodal.

It would still be a good idea to run RELIABILITY.

Of course, at the very beginning you should start you data quality assurance by clicking <data> <identify duplicate cases> and <identify unusual cases>.

Why don't you email me the output file and a data file with a unique case identifier and the DV and I'll look at the visualizations.

Also, since it often helps the thoroughness of response if different list participants react to posts, and since it helps to develop the archives, please hold most of the conversations on the list unless someone suggests going offline.

Art Kendall
Social Research Consultants

On 2/5/2011 3:03 AM, Faiz Rasool wrote:
 
----- Original Message -----
Sent: Saturday, February 05, 2011 3:19 AM
Subject: Re: [SPSSX-L] Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable.

Why do you care whether your whether your raw variable is normally distributed or not?  How are you going to use it?  Most often one is concerned with whether the residuals are very far from normally distributed rather than whether the raw variable is very far from normal.
I'm trying to check for normality of the distribution because most statistics books I read state that parametric tests require that a DV be normally distributed. I intend to use this variable as a DV in Multiple regression analysis.

Also if this is a relatively new summative scale you may want to see how well the items go into making up a scale.  see RELIABILITY.
 
The scale is of water conservation behavior. It includes items that have been used previously in the literature. Participents were presented a list of different water conservation actions, and they were asked to indicate how often or rarely they performed a given behavior.  

If this is the unusual situation where much departure from normality of the raw variable might be important, have you looked at it via something like EXPLORE to see if the histogram show a very discrepant picture?
I cannot see histogram for myself due to visual impairment. That was the major reason I used K-S and S-W tests. If you allow me I can email the output of SPSS to you as an email attachment. Perhaps you can help me in deciding what the histogram tells  about the distribution of data.
Regards,
Faiz.

Art Kendall
Social Research Consultants

On 2/4/2011 4:42 PM, Faiz Rasool wrote:
Hi All,
 
I have a variable which is a sum of 10 Likert items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The sample size is 150. I'm trying to determine whether my variable is normally distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check for the normality of the variable. The value of K-S test was .104 (sig=.000), and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the sig value of these tests is above .05,  then it means that data is normally distributed. Value under 0.5 means that the data is not normally distributed. >From what I can decide the S-W test is saying that my variable is normal, and the K-S test is suggesting that it is not normal. I have tried to transform the data, but the test statistics have not really changed. Below I mention the mean, Standard deviation, Skewness, Kurtosis and test statistics obtained before and after transformation. The problem has not been solved by transformations. K-S test continues to suggest that data is not normal, while S-W test disagrees.
 
Statistics without any transformation:
 
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis = -.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
 
Statistics after log transformation:
 
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis = -.059. K-S statistic = .106 (sig=.000). S-W statistic = .976 (sig=.003).
 
When I used the SQRT function to transform the data, I obtained same test statistics as I did before using log transformation.  I have mentioned   those test statistics in the first paragraph of this email. In such a situation where one test appears to suggest that variable is normally distributed and the other test does not provide the same conclusion, what are my options. I cannot rely on graphs, due to my visual impairment, and statistical tests are an only option for me to establish whether my data is normal or not. Any suggestion and guidance is most appreciated.
 
Thanks all,
 
Faiz.