Skewness / High frequency of zero scores

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Skewness / High frequency of zero scores

BanLas
Hello all!

I'm writing here to get some input from people who hopefully know more about statistic than myself..

I have a problem with the distribution of my dataset (don't we all?). I have 4 measures of eating disorder pathology, in which a '0' score corresponds to an absence of such pathology. All 4 measures have about 40-60% zero scorers (n = 1080), so the data is heavily positively skewed. I also have an age and a weight variable, no particular problems with non-normality with these.

I wish to use both a two-way ANOVA  oneach of my 4 measures (grouped according to age and weight), and a correlational / regression analyses between my 4 measures, age and weight. Of course, my dataset violates the normality distribution. Also, I have read that neither tranforming the problematic variables nor using non-parametric tests will do the trick, since all those who scored '0' would be assigned the same rank.

My question is obviously what is reasonable to do in this situation.
What is more problematic; the skewness itself or the high frequency of equal scores?
I have read that it is possible to exclude everyone that scores '0' from correlational analyses; and only do analysis on them who have some degree of eating disorder pathology. Is this feasible to do?

Any comments are appreciated!
Regards, Lasse
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Maurice Vergeer
one option is to do two separate analyses
-logistic regression yes/no disorder
-remove zero's and do the normal ols regression on those that do have
the disorder

another one is perform a zero-inflated negative binomial regression
(not implemented in spss (if I'm correct) but is in R)
I don't yet have experience with it myself but will in a few weeks
because I have similar data distributions.

Maurice


On Thu, Oct 28, 2010 at 10:58, BanLas <[hidden email]> wrote:

> Hello all!
>
> I'm writing here to get some input from people who hopefully know more about
> statistic than myself..
>
> I have a problem with the distribution of my dataset (don't we all?). I have
> 4 measures of eating disorder pathology, in which a '0' score corresponds to
> an absence of such pathology. All 4 measures have about 40-60% zero scorers
> (n = 1080), so the data is heavily positively skewed. I also have an age and
> a weight variable, no particular problems with non-normality with these.
>
> I wish to use both a two-way ANOVA � oneach of my 4 measures (grouped
> according to age and weight), and a correlational / regression analyses
> between my 4 measures, age and weight. Of course, my dataset violates the
> normality distribution. Also, I have read that neither tranforming the
> problematic variables nor using non-parametric tests will do the trick,
> since all those who scored '0' would be assigned the same rank.
>
> My question is obviously what is reasonable to do in this situation.
> What is more problematic; the skewness itself or the high frequency of equal
> scores?
> I have read that it is possible to exclude everyone that scores '0' from
> correlational analyses; and only do analysis on them who have some degree of
> eating disorder pathology. Is this feasible to do?
>
> Any comments are appreciated!
> Regards, Lasse
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-tp3240173p3240173.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
___________________________________________________________________
Maurice Vergeer
Department of communication
Radboud University�  (www.ru.nl)
PO Box 9104
NL-6500 HE Nijmegen
The Netherlands

Visiting Professor Yeungnam University, Gyeongsan, South Korea

contact:
E: [hidden email]
T: +31 24 3612297 (direct)/ 3612372 (secretary) / maurice.vergeer (skype)
personal webpage: www.mauricevergeer.nl
blog:�  http://blog.mauricevergeer.nl/
Journalism: www.journalisteninhetdigitaletijdperk.nl
CENMEP New Media and European Parliament Elections 2009
http://mauricevergeer.ruhosting.nl/cenmep

Recent publications:
- Eisinga, R., Franses, Ph.H. & Vergeer, M. (accepted for
publication). Weather conditions and daily television use in the
Netherlands, 1996-2005. International Journal of Biometeorology.
- Vergeer, M. & Pelzer, B. (2009). Consequences of media and Internet
use for offline and online network capital and well-being. A causal
model approach. Journal of Computer-Mediated Communication, 15,
189-210.
- Vergeer, M., Coenders, M. & Scheepers, P. (2009). Time spent on
television in European countries. In R.P. Konig, P.W.M. Nelissen, &
F.J.M. Huysmans (Eds.), Meaningful media: Communication Research on
the Social Construction of Reality (54-73). Nijmegen, The Netherlands:
Tandem Felix.
- Hermans, L., Vergeer, M., &�  d’Haenens, L. (2009). Internet in the
daily life of journalists. Explaining the use of the Internet through
work-related characteristics and professional opinions. Journal of
Computer-Mediated Communication, 15, 138-157.
___________________________________________________________________

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Bob Schacht
In reply to this post by BanLas
At 01:58 AM 10/28/2010, BanLas wrote:
Hello all!

I'm writing here to get some input from people who hopefully know more about
statistic than myself..

I have a problem with the distribution of my dataset (don't we all?). I have
4 measures of eating disorder pathology, in which a '0' score corresponds to
an absence of such pathology. All 4 measures have about 40-60% zero scorers
(n = 1080), so the data is heavily positively skewed. I also have an age and
a weight variable, no particular problems with non-normality with these.

I wish to use both a two-way ANOVA  oneach of my 4 measures (grouped
according to age and weight), and a correlational / regression analyses
between my 4 measures, age and weight. Of course, my dataset violates the
normality distribution. Also, I have read that neither tranforming the
problematic variables nor using non-parametric tests will do the trick,
since all those who scored '0' would be assigned the same rank.

My question is obviously what is reasonable to do in this situation.
What is more problematic; the skewness itself or the high frequency of equal
scores?
I have read that it is possible to exclude everyone that scores '0' from
correlational analyses; and only do analysis on them who have some degree of
eating disorder pathology. Is this feasible to do?


You haven't said much about the other three measures, and whether they truly represent a scale of progressively worse pathology.
Assuming that they are, what you may have here is a variable that has a poisson distribution, which I would guess is the case, anyway. That is,  it is reasonable to guess that most people do not have the pathology, only a very few have extreme pathology, and the categories decline in frequency from "none" to "extreme."  If that is the case, then by definition the distribution is not "normal," and a transformation is not going to make it normal, because the mode is not a measure of central tendency.

I'd be tempted to do an ANOVA in which your eating pathology variable is used to define 4 groups, and the null hypothesis is that the other variables will be distributed the same, regardless of which eating pathology group the subject is in.

Bob Schacht
Northern Arizona University
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

BanLas
In reply to this post by BanLas

Thank you for all the helpful comments!

No; there are no repeated measures. Measurement of eating disorder pathology is done once for each participant.

Also; considering the explanation of a Poisson distribution posted above, I would say that all my 4 measures of eating disorder pathology has a poisson distribution.

Thanks;
-Lasse
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Ryan
Lasse,
 
A standard Poisson distribution may or may not be the optimal distribution to select for your particular problem. Poisson regression is used for modelling count data that can theoretically range from 0 to infinity. If you have [zero-]inflation, truncation and/or a high conditional variance relative to the conditional mean, then it might be worthwhile modifying the log-likelihood function accordingly. Also, based on the information you've provided, it seems to me that an ordered logits equation should probably be considered as well. Finally, if you decide to model all 4 measures simultaneously (multivariate model), which may or may not be a reasonable approach, then you will likely need to account for within-subject correlation.
 
Ryan
On Mon, Nov 1, 2010 at 4:39 AM, BanLas <[hidden email]> wrote:
Thank you for all the helpful comments!

No; there are no repeated measures. Measurement of eating disorder pathology
is done once for each participant.

Also; considering the explanation of a Poisson distribution posted above, I
would say that all my 4 measures of eating disorder pathology has a poisson
distribution.

Thanks;
-Lasse

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-tp3240173p3244734.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Art Kendall
In reply to this post by BanLas
Please cobble together a small set of data that describes what you have.
Please detail the number of variables, the levels that are possible for
each.

Do you have 4 groups of respondents?  Each group has a level of pathology.
I.e.,  you do not have 4 measures of pathology, but 1 measure with 4 levels.

or do you have 4 measures of different kinds of pathology all with a
yes/no (2level) response scale or some other response scale?

If a person does not score zero what other possible scores can they have.

Which are the DVs (dependent variables) and which are the IVs
(independent variables)?
Are you interest in finding out how the people in the 4 groups can be
distinguished?

Are you interested in predicting which of the 4 groups someone would be in?

Etc.

Art Kendall
Social Research Consultants

On 10/28/2010 4:58 AM, BanLas wrote:

> Hello all!
>
> I'm writing here to get some input from people who hopefully know more about
> statistic than myself..
>
> I have a problem with the distribution of my dataset (don't we all?). I have
> 4 measures of eating disorder pathology, in which a '0' score corresponds to
> an absence of such pathology. All 4 measures have about 40-60% zero scorers
> (n = 1080), so the data is heavily positively skewed. I also have an age and
> a weight variable, no particular problems with non-normality with these.
>
> I wish to use both a two-way ANOVA  oneach of my 4 measures (grouped
> according to age and weight), and a correlational / regression analyses
> between my 4 measures, age and weight. Of course, my dataset violates the
> normality distribution. Also, I have read that neither tranforming the
> problematic variables nor using non-parametric tests will do the trick,
> since all those who scored '0' would be assigned the same rank.
>
> My question is obviously what is reasonable to do in this situation.
> What is more problematic; the skewness itself or the high frequency of equal
> scores?
> I have read that it is possible to exclude everyone that scores '0' from
> correlational analyses; and only do analysis on them who have some degree of
> eating disorder pathology. Is this feasible to do?
>
> Any comments are appreciated!
> Regards, Lasse
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-tp3240173p3240173.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

BanLas
In reply to this post by BanLas
Some have requested more information about variables and aims of analyses, so here goes:

Sample size; n = 1080.

I have 4 different measures (each measure consists of items in a 7-point Likert style format) of eating disorder pathology; each measure provides a global mean score indicating the severity of eating disorder pathology; ranging from 0 and upwards. These measures are therefore continuous. A zero score is interpreted as a total absence of pathology. Needless to say, these measure are all positively skewed, with as much as 40-60% of my sample scoring zero (or very close to zero) across all 4 measures. I have two additional variables; age and weight, which both are categorical (4 categories for both variables).

The first thing I would be interested in is whether or not there are any group differences in eating disorder pathology. Therefore; I was thinking to do 4 two-way ANOVA's; one for each eating disorder measure (that is, age and BMI are independent variables and the measures of eating disorder pathology are dependent variables). I would also like to know between which groups any differences lie with post hoc tests.

Secondly; I would like to know the strength of relationship between the 4 measure of eating disorder pathology, i.e. do any of the meaurements covary more than others. A related question is to investigate how good 3 of the measures perform in predicting the fourth. I was then thinking of a multiple regression analysis, with three of the measures as predictors and the last one as dependent variable. (I am not suggesting to do 4 separate regression analyses, I am only interested in predicting one of the measures).

But, alas, there is the problem with skewness / high frequency of zero scorers, and
 **I was wondering what statisticians think about this.
 **What are the consequences of doing parametric tests on a sample distribution that clearly violates the normality assumption?
 ***And are there any alternative approaches except for transformation of variables or the use of non-parametric tests; which I have read are all poor alternatives when dealing with distributions that are skewed and consist of many zero scores.
 ***Is bootstrapping; or permutation tests a feasible alternative?

Regards;
-Lasse

Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Art Kendall
Since the 4 measures of pathology are all on the same response scale (an
extent response scale) how are they intended to be the same/different?
Are they intended to be items in creating a summative scale?

If all four are measured on each subject why do you not consider them to
be some form of repeated measure?

What are the groups?  The only IVs you mentioned are continuous.



Art Kendall
Social Research Consultants

On 11/1/2010 10:38 AM, BanLas wrote:

> Some have requested more information about variables and aims of analyses, so
> here goes:
>
> Sample size; n = 1080.
>
> I have 4 different measures (each measure consists of items in a 7-point
> Likert style format) of eating disorder pathology; each measure provides a
> global mean score indicating the severity of eating disorder pathology;
> ranging from 0 and upwards. These measures are therefore continuous. A zero
> score is interpreted as a total absence of pathology. Needless to say, these
> measure are all positively skewed, with as much as 40-60% of my sample
> scoring zero (or very close to zero) across all 4 measures. I have two
> additional variables; age and weight, which both are categorical (4
> categories for both variables).
>
> The first thing I would be interested in is whether or not there are any
> group differences in eating disorder pathology. Therefore; I was thinking to
> do 4 two-way ANOVA's; one for each eating disorder measure (that is, age and
> BMI are independent variables and the measures of eating disorder pathology
> are dependent variables). I would also like to know between which groups any
> differences lie with post hoc tests.
>
> Secondly; I would like to know the strength of relationship between the 4
> measure of eating disorder pathology, i.e. do any of the meaurements covary
> more than others. A related question is to investigate how good 3 of the
> measures perform in predicting the fourth. I was then thinking of a multiple
> regression analysis, with three of the measures as predictors and the last
> one as dependent variable. (I am not suggesting to do 4 separate regression
> analyses, I am only interested in predicting one of the measures).
>
> But, alas, there is the problem with skewness / high frequency of zero
> scorers, and
>   **I was wondering what statisticians think about this.
>   **What are the consequences of doing parametric tests on a sample
> distribution that clearly violates the normality assumption?
>   ***And are there any alternative approaches except for transformation of
> variables or the use of non-parametric tests; which I have read are all poor
> alternatives when dealing with distributions that are skewed and consist of
> many zero scores.
>   ***Is bootstrapping; or permutation tests a feasible alternative?
>
> Regards;
> -Lasse
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-tp3240173p3245150.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Bruce Weaver
Administrator
Art Kendall wrote
Since the 4 measures of pathology are all on the same response scale (an
extent response scale) how are they intended to be the same/different?
Are they intended to be items in creating a summative scale?

If all four are measured on each subject why do you not consider them to
be some form of repeated measure?

What are the groups?  The only IVs you mentioned are continuous.
Hi Art.  BanLas said first that "age and weight, which both are categorical (4 categories for both variables)."  But in the next paragraph:  "...age and BMI are independent variables and the measures of eating disorder pathology are dependent variables".  

So I assume the explanatory variables are actually Age and BMI, and that BanLas has either carved each of them into 4 categories, or only has access to them in that form.

BanLas -- do you have the raw data for both Age and BMI?  If so, why do you want to carve them into categories?  Generally speaking, this is a bad idea.  You use up degrees of freedom needlessly, and you throw away information (which results in loss of power).  Where possible, continuous variables should be treated as continuous.  

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

BanLas
Art; all 4 measures are different in that they measure distinct facets of eating disorder pathology; i.e. one measure psychosocial impairment, another restrained eating etc. Some of the measure can indeed be separated to produce an overall or global score of eating disorder pathology, but for this study I wish to treat them separate as there are some important theoretical and empirical differences among them that I wish to elucidate.

Bruce; Both age and BMI (sorry I said weight at first) are measured continuously. However, my groups division of both BMI and age follow certain clinical conventions, and I therefore want to proceed with categorising; to see if there are any significant differences in for instance restrained eating between normalweight and overweight women.

When talking about these sorts of group analyses; age and BMI are IVs; and the eating disorder pathology measures are DVs.

The original question was to what extent my skewed data; and high frequency of zero scores affect parametric tests, and what potential alternatives there are.

-Lasse
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Bruce Weaver
Administrator
BanLas wrote
Art; all 4 measures are different in that they measure distinct facets of eating disorder pathology; i.e. one measure psychosocial impairment, another restrained eating etc. Some of the measure can indeed be separated to produce an overall or global score of eating disorder pathology, but for this study I wish to treat them separate as there are some important theoretical and empirical differences among them that I wish to elucidate.

Bruce; Both age and BMI (sorry I said weight at first) are measured continuously. However, my groups division of both BMI and age follow certain clinical conventions, and I therefore want to proceed with categorising; to see if there are any significant differences in for instance restrained eating between normalweight and overweight women.

When talking about these sorts of group analyses; age and BMI are IVs; and the eating disorder pathology measures are DVs.

The original question was to what extent my skewed data; and high frequency of zero scores affect parametric tests, and what potential alternatives there are.

-Lasse
Regarding clinical conventions about imposing cut-points on a continuous variable, I understand that clinicians may need to sort people into categories ultimately--e.g., treat or don't treat.  But from a statistical point of view, I think one should delay that categorization as long as possible.  Here's a simple example.  Suppose you have a simple two-variable situation with BMI as the explanatory variable, and one of your scales as the outcome variable.  If you carve BMI into the usual categories, but treat the outcome variable as continuous, you'll be doing a one-way ANOVA.  In the ANOVA model, the fitted value for any individual is the mean of the category they belong to.  So two people who differ quite a bit in BMI, but who fall within the same category, will have the same fitted value in this model.  On the other hand, two people who differ by only a tiny amount, but who happen to fall in two different categories, could have substantially different fitted values of Y.  Do you really want a model like that?  

<soapbox>
What I would prefer is to fit a simple linear regression model, with X = actual BMI.  THEN, if the conventional BMI categories are needed, draw vertical lines on the X-axis to indicate the conventional cut-points, and apply the cut-points to the fitted values.  This makes more sense to me.
</soapbox>

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

BanLas
I see your point regarding categorising data; and I will consider doing something similar to what you suggested (but with both BMI and age as predictors)! But the question still remains; how will the regression model perform when the DV is such a skewed score, with median and mode values of 0?
Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Maurice Vergeer

again, I suggest you look into the two options I posted earlier.
Maurice

This message was written on a mobile phone

Op 3 nov 2010 10:26 schreef "BanLas" <[hidden email]>:

I see your point regarding categorising data; and I will consider doing
something similar to what you suggested (but with both BMI and age as
predictors)! But the question still remains; how will the regression model
perform when the DV is such a skewed score, with median and mode values of
0?
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-tp3240173p3248142.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage ...

Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Ryan
In reply to this post by BanLas
Lasse,
 
I would advise against fitting a general linear model (e.g., ANOVA) if you are observing a large percentage of zeros along with high skew. If the response options represent counts, then I would consider fitting a zero-inflated count model (e.g., zero-inflated poisson, zero-inflated negative binomial), possibly adjusting the log likelihood function to account for an upper truncation if your counts cannot go above a specified value (e.g., 7). If you want to model four count dependent variables simultaneously, then you would enhance the model to account for a multivariate response (e.g. multivariate zero-inflated poisson, possibly with an upper truncation). Unfortunately, I do not believe that SPSS is capable of fitting this type of model. One could certainly fit such a model via the NLMIXED procedure in SAS. There are other options, such as dichotomizing the dependent variables, as suggested by others. If you were to dichotomize the dependent variables, then you might consider fitting a GEE/generalized linear model. Alternatively, you might find that it makes sense to collapse a couple of the adjacent categories and then consider fitting some sort of multivariate ordered logits model. Based on the information you have provided, I cannot say which is the optimal approach.
 
You have and will likely continue to receive different types of recommendations. You should really consider reaching out to a statistician in your area [to whom you can provide the entire picture with respect to your data and research questions] in order for him/her to help you make some of these important decisions.
 
Best wishes,
 
Ryan
 
On Tue, Nov 2, 2010 at 10:31 AM, BanLas <[hidden email]> wrote:
Art; all 4 measures are different in that they measure distinct facets of
eating disorder pathology; i.e. one measure psychosocial impairment, another
restrained eating etc. Some of the measure can indeed be separated to
produce an overall or global score of eating disorder pathology, but for
this study I wish to treat them separate as there are some important
theoretical and empirical differences among them that I wish to elucidate.

Bruce; Both age and BMI (sorry I said weight at first) are measured
continuously. However, my groups division of both BMI and age follow certain
clinical conventions, and I therefore want to proceed with categorising; to
see if there are any significant differences in for instance restrained
eating between normalweight and overweight women.

When talking about these sorts of group analyses; age and BMI are IVs; and
the eating disorder pathology measures are DVs.

The original question was to what extent my skewed data; and high frequency
of zero scores affect parametric tests, and what potential alternatives
there are.

-Lasse
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-tp3240173p3246798.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Skewness / High frequency of zero scores

Ian Martin-2
In reply to this post by BanLas
Lasse,

You seem to want to treat a couple of continuous variables (BMI, Age)
as categorical Independent Variables, and a number of categorical
pathology variables as continuous Dependent Variables.  As Bruce W.
points out, there are pretty good reasons for NOT categorizing BMI
and Age until you are actually in a prescriptive clinical situation
(and even there the boundaries between categories are subject to
change, as you must know).

I've certainly had to point out to a number of clinicians that their
findings based on self-imposed categorization of continuous variables
will be completely useless if -- as often happens -- the cutpoints
between categories change in the near future, as they have done quite
recently for pathologies like obesity, diabetes, etc.  And obviously,
in a categorization of Age or BMI, there will be adjacent continuous
values that fall into different categories.  This both lowers the
power of your analysis, and highlights the inadequacy of many
prescriptive clinical diagnostics.

Why not test the hypotheses that Age and BMI (as continuous,
dependent variables) do not/do vary between categories of your eating
disorder pathologies?  That would be a more conventional ANOVA, and
not so fraught with self imposed aberrant distributions. It would be
more valid, and equally interesting (I think) to see if a certain
stage of a pathology is defined by having a significantly lower/
higher Age or BMI than another stage of the same pathology.

If you are required by convention, peer or supervisory pressure to
categorize variables which clearly ought to be regarded as
continuous, then perhaps look into a wholly categorical analysis.
Perhaps log-linear models?

regards,
Ian
On 02 Nov, 2010, at 10:31 AM, BanLas wrote:

> Art; all 4 measures are different in that they measure distinct
> facets of
> eating disorder pathology; i.e. one measure psychosocial
> impairment, another
> restrained eating etc. Some of the measure can indeed be separated to
> produce an overall or global score of eating disorder pathology,
> but for
> this study I wish to treat them separate as there are some important
> theoretical and empirical differences among them that I wish to
> elucidate.
>
> Bruce; Both age and BMI (sorry I said weight at first) are measured
> continuously. However, my groups division of both BMI and age
> follow certain
> clinical conventions, and I therefore want to proceed with
> categorising; to
> see if there are any significant differences in for instance
> restrained
> eating between normalweight and overweight women.
>
> When talking about these sorts of group analyses; age and BMI are
> IVs; and
> the eating disorder pathology measures are DVs.
>
> The original question was to what extent my skewed data; and high
> frequency
> of zero scores affect parametric tests, and what potential
> alternatives
> there are.
>
> -Lasse
> --
> View this message in context: http://spssx-discussion.
> 1045642.n5.nabble.com/Skewness-High-frequency-of-zero-scores-
> tp3240173p3246798.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text
> except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD