Hi All,
I have a variable which is a sum of 10 Likert
items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The
sample size is 150. I'm trying to determine whether my variable is normally
distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check
for the normality of the variable. The value of K-S test was .104 (sig=.000),
and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the
sig value of these tests is above .05, then it means that data is normally
distributed. Value under 0.5 means that the data is not normally distributed.
From what I can decide the S-W test is saying that my variable is normal, and
the K-S test is suggesting that it is not normal. I have tried to transform the
data, but the test statistics have not really changed. Below I mention the mean,
Standard deviation, Skewness, Kurtosis and test statistics obtained before and
after transformation. The problem has not been solved by transformations. K-S
test continues to suggest that data is not normal, while S-W test disagrees.
Statistics without any transformation:
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis =
-.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
Statistics after log transformation:
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis =
-.059. K-S statistic = .106 (sig=.000). S-W statistic = .976
(sig=.003).
When I used the SQRT function to transform the
data, I obtained same test statistics as I did before using log transformation.
I have mentioned those test statistics in the first paragraph
of this email. In such a situation where one test appears to suggest that
variable is normally distributed and the other test does not provide the same
conclusion, what are my options. I cannot rely on graphs, due to my visual
impairment, and statistical tests are an only option for me to establish whether
my data is normal or not. Any suggestion and guidance is most
appreciated.
Thanks all,
Faiz.
|
Faiz, you wrote that S-W test significance was .007 which is way UNDER than .05, indicating that the data are NOT normally distributed.
Am i missing something here? bozena zdaniuk ----- Original Message ----- From: "Faiz Rasool" <[hidden email]> To: [hidden email] Sent: Friday, February 4, 2011 1:42:06 PM GMT -08:00 US/Canada Pacific Subject: Conflicting results of Kolmogorov-Smirnov and Shapiro-Wilk tests when testing for normality of a variable. Hi All,
I have a variable which is a sum of 10 Likert
items. Each item had 4 responses (1=never, 2=sometimes, 3= often, 4=always). The
sample size is 150. I'm trying to determine whether my variable is normally
distributed or not. I used the Kolmogorov-Smirnov and Shapiro-Wilk test to check
for the normality of the variable. The value of K-S test was .104 (sig=.000),
and value of S-W test was .975 (sig=.007). Field (2005), suggested that if the
sig value of these tests is above .05, then it means that data is normally
distributed. Value under 0.5 means that the data is not normally distributed.
From what I can decide the S-W test is saying that my variable is normal, and
the K-S test is suggesting that it is not normal. I have tried to transform the
data, but the test statistics have not really changed. Below I mention the mean,
Standard deviation, Skewness, Kurtosis and test statistics obtained before and
after transformation. The problem has not been solved by transformations. K-S
test continues to suggest that data is not normal, while S-W test disagrees.
Statistics without any transformation:
Mean = 26.05 SD = 3.369 Skewness = .117 Kurtosis =
-.219. K-S statistic = .104 (sig=.000). S-W statistic = .975 (sig=.007).
Statistics after log transformation:
mean = 1.41 SD = .057 Skewness = -.235 Kurtosis =
-.059. K-S statistic = .106 (sig=.000). S-W statistic = .976
(sig=.003).
When I used the SQRT function to transform the
data, I obtained same test statistics as I did before using log transformation.
I have mentioned those test statistics in the first paragraph
of this email. In such a situation where one test appears to suggest that
variable is normally distributed and the other test does not provide the same
conclusion, what are my options. I cannot rely on graphs, due to my visual
impairment, and statistical tests are an only option for me to establish whether
my data is normal or not. Any suggestion and guidance is most
appreciated.
Thanks all,
Faiz.
|
In reply to this post by Faiz Rasool
Why do you care whether your whether your
raw variable is normally distributed or not? How are you going to
use it? Most often one is concerned with whether the residuals are
very far from normally distributed rather than whether the raw
variable is very far from normal.
Also if this is a relatively new summative scale you may want to see how well the items go into making up a scale. see RELIABILITY. If this is the unusual situation where much departure from normality of the raw variable might be important, have you looked at it via something like EXPLORE to see if the histogram show a very discrepant picture? Art Kendall Social Research Consultants On 2/4/2011 4:42 PM, Faiz Rasool wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Normality of a DV is a question that often comes
up. There have been many posts to this list and others. In
brief, in multiple regression one assumes that the residuals are
not clearly different from normally distributed.
I am not where my books are so cannot give a more precise citation but an excellent book is titled something like Cohen (2003) Applied Multiple Regression and Correlation. I know that many sources say that normality of the DV is important. But consider the simplest regression model a continuous DV and a dichotomous predictor. One would hope that the DV by itself were definitely bimodal. It would still be a good idea to run RELIABILITY. Of course, at the very beginning you should start you data quality assurance by clicking <data> <identify duplicate cases> and <identify unusual cases>. Why don't you email me the output file and a data file with a unique case identifier and the DV and I'll look at the visualizations. Also, since it often helps the thoroughness of response if different list participants react to posts, and since it helps to develop the archives, please hold most of the conversations on the list unless someone suggests going offline. Art Kendall Social Research Consultants On 2/5/2011 3:03 AM, Faiz Rasool wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
To add to what Art said, note the distinction between "errors" and "residuals". This is an important distinction that is often overlooked. The Wikipedia page does a good job of explaining it.
http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics So the usual i.i.d. N(0,sigma-squared) assumption for OLS linear regression applies to the unobservable errors; but we assess it using the observable residuals, because they're all we have. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Art Kendall
I didn't realized that the list is configured as
such that replies to emails go directly to the sender, and not to the list. My
apologies.
I have run the reliability analysis on the
items that I have used to measure variables. For couple of variables
Cronbach statistic is .49, and if item is deleted it can go up to
.52. For other variables it is .59, and .80. Here is a question for
all the list members: when we should really consider deleting an item from our
scale to increase its Cronbach alpha? If we our studying conservation behaviors
for example, and we drop a conservation related item from our scale, are we
not ignoring the fact that a particular behavior is unusual in our
study participants, hence a low cronbach statistics for our scale. So are
there guidelines on what factors should we consider when deciding to
retain an item in our scale even though it may contribute in low Cronbach
alpha.
Thanks, Faiz.
|
Free forum by Nabble | Edit this page |