SPSSX Discussion

Query re authenticity of p-values for chi-square goodness of fit test

Classic

List

Threaded

6 messages Options

Margaret MacDougall

Query re authenticity of p-values for chi-square goodness of fit test

Hello

I would be most grateful for some advice on whether or not to accept the p-value obtained from running the chi-square goodness-of-fit test in SPSS when the expected count is less than 5. I am familiar with the golden rule for when to use Fisher's Exact test for cross-tabulated data but what should one do in the one-dimensional case? When the expected count is less than 5, a p-value is generated by SPSS but this is accompanied by a footnote indicating that all expected counts are less than 5 (with no advice as to why this is important to know).

Many thanks for your interest

Best wishes

Margaret

---------------------------------
Inbox full of spam? Get leading spam protection and 1GB storage with All New Yahoo! Mail.

Dominic Lusinchi

Re: Query re authenticity of p-values for chi-square goodness of fit test

Margaret,

I don't see that anybody replied to your query.

You have two options (apart from collecting more data, but we will ignore
that): collapse categories, or use the multinomial distribution, which is
the extension of the binomial when there are more than 2 outcomes. There may
be syntax on Ray's site to compute it, but I don't know.

To see the formula for the multinomial go to

http://en.wikipedia.org/wiki/Multinomial_distribution

you will see how closely it relates to the binomial.

Sorry I can't be more specific than that.

Good luck,
Dominic

Dominic Lusinchi
Statistician
Far West Research
Statistical Consulting
San Francisco, California
415-664-3032
www.farwestresearch.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Margaret MacDougall
Sent: Tuesday, September 05, 2006 3:02 AM
To: [hidden email]
Subject: Query re authenticity of p-values for chi-square goodness of fit
test

Hello

I would be most grateful for some advice on whether or not to accept the
p-value obtained from running the chi-square goodness-of-fit test in SPSS
when the expected count is less than 5. I am familiar with the golden rule
for when to use Fisher's Exact test for cross-tabulated data but what should
one do in the one-dimensional case? When the expected count is less than 5,
a p-value is generated by SPSS but this is accompanied by a footnote
indicating that all expected counts are less than 5 (with no advice as to
why this is important to know).

Many thanks for your interest

Best wishes

Margaret

---------------------------------
Inbox full of spam? Get leading spam protection and 1GB storage with All
New Yahoo! Mail.

Margaret MacDougall

Re: Query re authenticity of p-values for chi-square goodness of fit test

Dear Dominic

Thank you for your kind reply. However, I do not wish to collapse categories and I am already assuming a multnomial distribution. What I really need to know is whether it is sound to use the chi-square goodness-of-fit test when the expected count is less than 5 and indeed why SPSS chooses to specifically flag the result that the expected count is less than 5 when there does not appear to be an alternative test for the one-dimensional case to which one can resort in such cases.

Best wishes

Margaret

Dominic Lusinchi <[hidden email]> wrote:
Margaret,

I don't see that anybody replied to your query.

You have two options (apart from collecting more data, but we will ignore
that): collapse categories, or use the multinomial distribution, which is
the extension of the binomial when there are more than 2 outcomes. There may
be syntax on Ray's site to compute it, but I don't know.

To see the formula for the multinomial go to

http://en.wikipedia.org/wiki/Multinomial_distribution

you will see how closely it relates to the binomial.

Sorry I can't be more specific than that.

Good luck,
Dominic

Dominic Lusinchi
Statistician
Far West Research
Statistical Consulting
San Francisco, California
415-664-3032
www.farwestresearch.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Margaret MacDougall
Sent: Tuesday, September 05, 2006 3:02 AM
To: [hidden email]
Subject: Query re authenticity of p-values for chi-square goodness of fit
test

Hello

I would be most grateful for some advice on whether or not to accept the
p-value obtained from running the chi-square goodness-of-fit test in SPSS
when the expected count is less than 5. I am familiar with the golden rule
for when to use Fisher's Exact test for cross-tabulated data but what should
one do in the one-dimensional case? When the expected count is less than 5,
a p-value is generated by SPSS but this is accompanied by a footnote
indicating that all expected counts are less than 5 (with no advice as to
why this is important to know).

Many thanks for your interest

Best wishes

Margaret

---------------------------------
Inbox full of spam? Get leading spam protection and 1GB storage with All
New Yahoo! Mail.

---------------------------------
Win a BlackBerry device from O2 with Yahoo!. Enter now.

Marta García-Granero

Re: Query re authenticity of p-values for chi-square goodness of fit test

In reply to this post by Dominic Lusinchi

Hi Margaret

DL> I would be most grateful for some advice on whether or not to accept the
DL> p-value obtained from running the chi-square goodness-of-fit test in SPSS
DL> when the expected count is less than 5. I am familiar with the golden rule
DL> for when to use Fisher's Exact test for cross-tabulated data but what should
DL> one do in the one-dimensional case? When the expected count is less than 5,
DL> a p-value is generated by SPSS but this is accompanied by a footnote
DL> indicating that all expected counts are less than 5 (with no advice as to
DL> why this is important to know).

Cochran's rules on chi-square tests (all that compare Observed vs
Expected frequencies using: Sum[(Obs-Exp)**2/Exp]) state that a
minimum expected frequency of 5 for every cell is needed for
asymptotic p-values to be valid. Anyway, this condition can be relaxed
a bit: a minimum expected frequency of 1 is accepted as long as less
than 20% of the cells have such low frequencies. Outside these
conditions (tight or relaxed) asymptotic testing is not valid.

Now, you say that ALL cells have a low expected frequency (below 5).
This can mean that either you have too many categories (and should
try, as Dominic mentioned, to collapse them in a meaningful way, if it
is possible), or an overall sample size too small for reliable
asymptotic testing. I suppose that you don't have the EXACT TESTS
module installed? That would allow you to get an exact p-value,
independent of the minimum expected frequency limitation (I do have it
installed, and, if you want, I can compute the exact p-value for you
if you send me the observed frequencies in a private mail).

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)

Marta García-Granero

Re: Query re authenticity of p-values for chi-square goodness of fit test

In reply to this post by Margaret MacDougall

Hi again Margaret

Tuesday, September 5, 2006, 6:56:45 PM, You wrote:

MM> Dear Dominic

MM> Thank you for your kind reply. However, I do not wish to
MM> collapse categories and I am already assuming a multnomial
MM> distribution.

Yes, but you are testing your hypothesis using an asymptotic statistic
(chi-square), valid only if Exp GE 5. What Dominic suggested is that
you compute the exact p-value using the multinomial distribution
(that's what the EXACT TESTS SPSS module does).

MM> What I really need to know is whether it is sound to use the
MM> chi-square goodness-of-fit test when the expected count is less
MM> than 5 and indeed why SPSS chooses to specifically flag the result
MM> that the expected count is less than 5 when there does not appear
MM> to be an alternative test for the one-dimensional case to which
MM> one can resort in such cases.

Apart from the above mentioned exact test, I think that likelihood
ratio test (also called G-tests) is a bit more robust than Pearson's
chi-square statistic (mainly when this condition happens: |Obs-Exp|
GTExp). SPSS includes the LR test with contingency tables (CROSSTABS),
but not for goodness of fit test. Anyway, following my custom
(anything-can-be-done-with-matrix), here is a MATRIX program that can
be adapted to your data without much trouble (the only problem is that
it needs the categorical variable to be a string, not numerical, but
that could also be modified):

* Sample dataset *.
DATA LIST LIST/mice(A8) obs(F8).
BEGIN DATA
White 380
SBrPatch 330
LBrPatch 74
END DATA.

MATRIX.
PRINT /TITLE='GOODNESS OF FIT G-TEST'.
GET class /VAR=mice.
GET data /VAR= obs.
* Add here expected frequncies (under H0) *.
COMPUTE expected={51.0;40.8;8.2}.
COMPUTE expect=CSUM(data)*expected/MSUM(expected).
PRINT {data,expect,expected}
/FORMAT='F10.1'
/CLABEL='OBS','ESP','H0'
/RNAMES=class
/TITLE='Observed and expected frequencies'.
PRINT {NROW(expected)-1}
/FORMAT='F8.0'
/TITLE='Degrees of Freedom'.
COMPUTE totg=2*MSUM(data&*LN(data/expect)).
COMPUTE totsig=1-CHICDF(totg,NROW(expected)-1).
PRINT {totg;totsig}
/FORMAT='F8.4'
/RLABEL='Chi²','Sig'
/TITLE='G Statistic & significance'.
END MATRIX.

* Using Pearson Chi-square *.
AUTORECODE VARIABLES=mice /INTO color /DESCENDING /PRINT.
WEIGHT BY obs.
NPAR TEST
/CHISQUARE=color
/EXPECTED=51 40.8 8.2.

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)

Margaret MacDougall

Re: Query re authenticity of p-values for chi-square goodness of fit test

Dear Marta

Many thanks for the clarifications and additional information provided. This has been most helpful.

Best wishes

Margaret

Marta García-Granero <[hidden email]> wrote:
Hi again Margaret

Tuesday, September 5, 2006, 6:56:45 PM, You wrote:

MM> Dear Dominic

MM> Thank you for your kind reply. However, I do not wish to
MM> collapse categories and I am already assuming a multnomial
MM> distribution.

Yes, but you are testing your hypothesis using an asymptotic statistic
(chi-square), valid only if Exp GE 5. What Dominic suggested is that
you compute the exact p-value using the multinomial distribution
(that's what the EXACT TESTS SPSS module does).

MM> What I really need to know is whether it is sound to use the
MM> chi-square goodness-of-fit test when the expected count is less
MM> than 5 and indeed why SPSS chooses to specifically flag the result
MM> that the expected count is less than 5 when there does not appear
MM> to be an alternative test for the one-dimensional case to which
MM> one can resort in such cases.

Apart from the above mentioned exact test, I think that likelihood
ratio test (also called G-tests) is a bit more robust than Pearson's
chi-square statistic (mainly when this condition happens: |Obs-Exp|
GTExp). SPSS includes the LR test with contingency tables (CROSSTABS),
but not for goodness of fit test. Anyway, following my custom
(anything-can-be-done-with-matrix), here is a MATRIX program that can
be adapted to your data without much trouble (the only problem is that
it needs the categorical variable to be a string, not numerical, but
that could also be modified):

* Sample dataset *.
DATA LIST LIST/mice(A8) obs(F8).
BEGIN DATA
White 380
SBrPatch 330
LBrPatch 74
END DATA.

MATRIX.
PRINT /TITLE='GOODNESS OF FIT G-TEST'.
GET class /VAR=mice.
GET data /VAR= obs.
* Add here expected frequncies (under H0) *.
COMPUTE expected={51.0;40.8;8.2}.
COMPUTE expect=CSUM(data)*expected/MSUM(expected).
PRINT {data,expect,expected}
/FORMAT='F10.1'
/CLABEL='OBS','ESP','H0'
/RNAMES=class
/TITLE='Observed and expected frequencies'.
PRINT {NROW(expected)-1}
/FORMAT='F8.0'
/TITLE='Degrees of Freedom'.
COMPUTE totg=2*MSUM(data&*LN(data/expect)).
COMPUTE totsig=1-CHICDF(totg,NROW(expected)-1).
PRINT {totg;totsig}
/FORMAT='F8.4'
/RLABEL='Chi²','Sig'
/TITLE='G Statistic & significance'.
END MATRIX.

* Using Pearson Chi-square *.
AUTORECODE VARIABLES=mice /INTO color /DESCENDING /PRINT.
WEIGHT BY obs.
NPAR TEST
/CHISQUARE=color
/EXPECTED=51 40.8 8.2.

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)

---------------------------------
Try the all-new Yahoo! Mail . "The New Version is radically easier to use" The Wall Street Journal