Hello
I would be most grateful for some advice on whether or not to accept the p-value obtained from running the chi-square goodness-of-fit test in SPSS when the expected count is less than 5. I am familiar with the golden rule for when to use Fisher's Exact test for cross-tabulated data but what should one do in the one-dimensional case? When the expected count is less than 5, a p-value is generated by SPSS but this is accompanied by a footnote indicating that all expected counts are less than 5 (with no advice as to why this is important to know). Many thanks for your interest Best wishes Margaret --------------------------------- Inbox full of spam? Get leading spam protection and 1GB storage with All New Yahoo! Mail. |
Margaret,
I don't see that anybody replied to your query. You have two options (apart from collecting more data, but we will ignore that): collapse categories, or use the multinomial distribution, which is the extension of the binomial when there are more than 2 outcomes. There may be syntax on Ray's site to compute it, but I don't know. To see the formula for the multinomial go to http://en.wikipedia.org/wiki/Multinomial_distribution you will see how closely it relates to the binomial. Sorry I can't be more specific than that. Good luck, Dominic Dominic Lusinchi Statistician Far West Research Statistical Consulting San Francisco, California 415-664-3032 www.farwestresearch.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Margaret MacDougall Sent: Tuesday, September 05, 2006 3:02 AM To: [hidden email] Subject: Query re authenticity of p-values for chi-square goodness of fit test Hello I would be most grateful for some advice on whether or not to accept the p-value obtained from running the chi-square goodness-of-fit test in SPSS when the expected count is less than 5. I am familiar with the golden rule for when to use Fisher's Exact test for cross-tabulated data but what should one do in the one-dimensional case? When the expected count is less than 5, a p-value is generated by SPSS but this is accompanied by a footnote indicating that all expected counts are less than 5 (with no advice as to why this is important to know). Many thanks for your interest Best wishes Margaret --------------------------------- Inbox full of spam? Get leading spam protection and 1GB storage with All New Yahoo! Mail. |
Dear Dominic
Thank you for your kind reply. However, I do not wish to collapse categories and I am already assuming a multnomial distribution. What I really need to know is whether it is sound to use the chi-square goodness-of-fit test when the expected count is less than 5 and indeed why SPSS chooses to specifically flag the result that the expected count is less than 5 when there does not appear to be an alternative test for the one-dimensional case to which one can resort in such cases. Best wishes Margaret Dominic Lusinchi <[hidden email]> wrote: Margaret, I don't see that anybody replied to your query. You have two options (apart from collecting more data, but we will ignore that): collapse categories, or use the multinomial distribution, which is the extension of the binomial when there are more than 2 outcomes. There may be syntax on Ray's site to compute it, but I don't know. To see the formula for the multinomial go to http://en.wikipedia.org/wiki/Multinomial_distribution you will see how closely it relates to the binomial. Sorry I can't be more specific than that. Good luck, Dominic Dominic Lusinchi Statistician Far West Research Statistical Consulting San Francisco, California 415-664-3032 www.farwestresearch.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Margaret MacDougall Sent: Tuesday, September 05, 2006 3:02 AM To: [hidden email] Subject: Query re authenticity of p-values for chi-square goodness of fit test Hello I would be most grateful for some advice on whether or not to accept the p-value obtained from running the chi-square goodness-of-fit test in SPSS when the expected count is less than 5. I am familiar with the golden rule for when to use Fisher's Exact test for cross-tabulated data but what should one do in the one-dimensional case? When the expected count is less than 5, a p-value is generated by SPSS but this is accompanied by a footnote indicating that all expected counts are less than 5 (with no advice as to why this is important to know). Many thanks for your interest Best wishes Margaret --------------------------------- Inbox full of spam? Get leading spam protection and 1GB storage with All New Yahoo! Mail. --------------------------------- Win a BlackBerry device from O2 with Yahoo!. Enter now. |
In reply to this post by Dominic Lusinchi
Hi Margaret
DL> I would be most grateful for some advice on whether or not to accept the DL> p-value obtained from running the chi-square goodness-of-fit test in SPSS DL> when the expected count is less than 5. I am familiar with the golden rule DL> for when to use Fisher's Exact test for cross-tabulated data but what should DL> one do in the one-dimensional case? When the expected count is less than 5, DL> a p-value is generated by SPSS but this is accompanied by a footnote DL> indicating that all expected counts are less than 5 (with no advice as to DL> why this is important to know). Cochran's rules on chi-square tests (all that compare Observed vs Expected frequencies using: Sum[(Obs-Exp)**2/Exp]) state that a minimum expected frequency of 5 for every cell is needed for asymptotic p-values to be valid. Anyway, this condition can be relaxed a bit: a minimum expected frequency of 1 is accepted as long as less than 20% of the cells have such low frequencies. Outside these conditions (tight or relaxed) asymptotic testing is not valid. Now, you say that ALL cells have a low expected frequency (below 5). This can mean that either you have too many categories (and should try, as Dominic mentioned, to collapse them in a meaningful way, if it is possible), or an overall sample size too small for reliable asymptotic testing. I suppose that you don't have the EXACT TESTS module installed? That would allow you to get an exact p-value, independent of the minimum expected frequency limitation (I do have it installed, and, if you want, I can compute the exact p-value for you if you send me the observed frequencies in a private mail). -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
In reply to this post by Margaret MacDougall
Hi again Margaret
Tuesday, September 5, 2006, 6:56:45 PM, You wrote: MM> Dear Dominic MM> Thank you for your kind reply. However, I do not wish to MM> collapse categories and I am already assuming a multnomial MM> distribution. Yes, but you are testing your hypothesis using an asymptotic statistic (chi-square), valid only if Exp GE 5. What Dominic suggested is that you compute the exact p-value using the multinomial distribution (that's what the EXACT TESTS SPSS module does). MM> What I really need to know is whether it is sound to use the MM> chi-square goodness-of-fit test when the expected count is less MM> than 5 and indeed why SPSS chooses to specifically flag the result MM> that the expected count is less than 5 when there does not appear MM> to be an alternative test for the one-dimensional case to which MM> one can resort in such cases. Apart from the above mentioned exact test, I think that likelihood ratio test (also called G-tests) is a bit more robust than Pearson's chi-square statistic (mainly when this condition happens: |Obs-Exp| GTExp). SPSS includes the LR test with contingency tables (CROSSTABS), but not for goodness of fit test. Anyway, following my custom (anything-can-be-done-with-matrix), here is a MATRIX program that can be adapted to your data without much trouble (the only problem is that it needs the categorical variable to be a string, not numerical, but that could also be modified): * Sample dataset *. DATA LIST LIST/mice(A8) obs(F8). BEGIN DATA White 380 SBrPatch 330 LBrPatch 74 END DATA. MATRIX. PRINT /TITLE='GOODNESS OF FIT G-TEST'. GET class /VAR=mice. GET data /VAR= obs. * Add here expected frequncies (under H0) *. COMPUTE expected={51.0;40.8;8.2}. COMPUTE expect=CSUM(data)*expected/MSUM(expected). PRINT {data,expect,expected} /FORMAT='F10.1' /CLABEL='OBS','ESP','H0' /RNAMES=class /TITLE='Observed and expected frequencies'. PRINT {NROW(expected)-1} /FORMAT='F8.0' /TITLE='Degrees of Freedom'. COMPUTE totg=2*MSUM(data&*LN(data/expect)). COMPUTE totsig=1-CHICDF(totg,NROW(expected)-1). PRINT {totg;totsig} /FORMAT='F8.4' /RLABEL='Chi²','Sig' /TITLE='G Statistic & significance'. END MATRIX. * Using Pearson Chi-square *. AUTORECODE VARIABLES=mice /INTO color /DESCENDING /PRINT. WEIGHT BY obs. NPAR TEST /CHISQUARE=color /EXPECTED=51 40.8 8.2. -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Dear Marta
Many thanks for the clarifications and additional information provided. This has been most helpful. Best wishes Margaret Marta García-Granero <[hidden email]> wrote: Hi again Margaret Tuesday, September 5, 2006, 6:56:45 PM, You wrote: MM> Dear Dominic MM> Thank you for your kind reply. However, I do not wish to MM> collapse categories and I am already assuming a multnomial MM> distribution. Yes, but you are testing your hypothesis using an asymptotic statistic (chi-square), valid only if Exp GE 5. What Dominic suggested is that you compute the exact p-value using the multinomial distribution (that's what the EXACT TESTS SPSS module does). MM> What I really need to know is whether it is sound to use the MM> chi-square goodness-of-fit test when the expected count is less MM> than 5 and indeed why SPSS chooses to specifically flag the result MM> that the expected count is less than 5 when there does not appear MM> to be an alternative test for the one-dimensional case to which MM> one can resort in such cases. Apart from the above mentioned exact test, I think that likelihood ratio test (also called G-tests) is a bit more robust than Pearson's chi-square statistic (mainly when this condition happens: |Obs-Exp| GTExp). SPSS includes the LR test with contingency tables (CROSSTABS), but not for goodness of fit test. Anyway, following my custom (anything-can-be-done-with-matrix), here is a MATRIX program that can be adapted to your data without much trouble (the only problem is that it needs the categorical variable to be a string, not numerical, but that could also be modified): * Sample dataset *. DATA LIST LIST/mice(A8) obs(F8). BEGIN DATA White 380 SBrPatch 330 LBrPatch 74 END DATA. MATRIX. PRINT /TITLE='GOODNESS OF FIT G-TEST'. GET class /VAR=mice. GET data /VAR= obs. * Add here expected frequncies (under H0) *. COMPUTE expected={51.0;40.8;8.2}. COMPUTE expect=CSUM(data)*expected/MSUM(expected). PRINT {data,expect,expected} /FORMAT='F10.1' /CLABEL='OBS','ESP','H0' /RNAMES=class /TITLE='Observed and expected frequencies'. PRINT {NROW(expected)-1} /FORMAT='F8.0' /TITLE='Degrees of Freedom'. COMPUTE totg=2*MSUM(data&*LN(data/expect)). COMPUTE totsig=1-CHICDF(totg,NROW(expected)-1). PRINT {totg;totsig} /FORMAT='F8.4' /RLABEL='Chi²','Sig' /TITLE='G Statistic & significance'. END MATRIX. * Using Pearson Chi-square *. AUTORECODE VARIABLES=mice /INTO color /DESCENDING /PRINT. WEIGHT BY obs. NPAR TEST /CHISQUARE=color /EXPECTED=51 40.8 8.2. -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) --------------------------------- Try the all-new Yahoo! Mail . "The New Version is radically easier to use" The Wall Street Journal |
Free forum by Nabble | Edit this page |