Hello everyone
I have to compare two sets of categorical data in a 2×4 table using SPSS. Most of the cells contain values less than five. SPSS will only do a Fisher’s Exact for a 2×2 table. Is likelihood ratio an acceptable alternative of Pearson Chi-Square ? Is there any other way using SPSS ? Thank you all for your help in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Ok I will bite. It is the expected frequencies that matter, not the observed. So are the expected frequencies above 5?
Note that many believe the 5 rule is quite conservative, see http://stats.stackexchange.com/a/14230/1036 for an overview by Frank Harrell. When the expected frequencies are very low, often people group different rows/columns, as opposed to doing different tests. |
Administrator
|
I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how about giving us the 8 cell counts?
Col1 Col2 Row1 a b Row2 c d Row3 e f Row4 g h And if you can tell us what the row and column variables are, even better. For example, if the variable with 4 categories is ordinal in nature, you might be able to use the test of linear-by-linear association that appears in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.) HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
A third possibility is to look for freeware applications (like WinPepi)
that can compute Fisher-Freeman-Halton test on 2xk tables, like this one. There is also an Excel macro (fishchi.xls), very slow, that can handle small frequencies in tables bigger than 2x2. I'm a great admirer of WinPepi, it's loaded with a lot of handy methods. My two cents, Marta GG El 16/04/2015 a las 23:24, Bruce Weaver escribió: > I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how > about giving us the 8 cell counts? > > Col1 Col2 > Row1 a b > Row2 c d > Row3 e f > Row4 g h > > And if you can tell us what the row and column variables are, even better. > For example, if the variable with 4 categories is ordinal in nature, you > might be able to use the test of linear-by-linear association that appears > in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see > http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.) > > HTH. > > > > > Andy W wrote >> Ok I will bite. It is the expected frequencies that matter, not the >> observed. So are the expected frequencies above 5? >> >> Note that many believe the 5 rule is quite conservative, see >> http://stats.stackexchange.com/a/14230/1036 for an overview by Frank >> Harrell. >> >> When the expected frequencies are very low, often people group different >> rows/columns, as opposed to doing different tests. > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729243.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by George. J. Pappas
I don't find Howell's explanation very useful - furthermore, he does not
interpret the results. What does the M2 statistic (linear-by-linear association) tell us with regard to the problem at hand? Do people who have experienced more traumas drop-out at a higher rate or don't they? On top of it, the table on Howell's page does not show percentages. When we add percentages, we see that the rate of drop-outs generally tends to increase as the number of traumas increase; or the proportion that remain in treatment decrease as the number of traumas increases, generally speaking. His bar chart is badly labeled: the vertical axis label reads "Percentage Dropout", but the numbers by the tick marks are proportions (?). It seems to me that Kendall's tau (which Howell mentions in passing) or Somers' d are much more useful statistics for this sort of situation, do you not think? They are both directional - a useful piece of information: the association between the two variables is negative. And they were specifically designed for ordinal contingency tables. Somers' d is a bit tricky here (perhaps I should say a bit counter-intuitive) because it tells me that when trauma is used as the explanatory variable, the association between the two variables is weaker (-.142) than when drop-out is used as the independent variable (-.200) - although the difference is not very large. In any case, it would seem that Jennifer Mahon's hypothesis is correct. Or am I missing something? Cheers - Dominic ********************************************* Dominic Lusinchi Far West Research Consulting Applied Statistics - Social Research - Sociology San Francisco, California [hidden email] 1-415-664-3032 CV: http://www.farwestresearch.com/staff/dl/dlcv.html ********************************************* -----Original Message----- From: Bruce Weaver [mailto:[hidden email]] Sent: Thursday, April 16, 2015 2:24 PM Subject: Re: CHI SQUARE SPSS I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how about giving us the 8 cell counts? Col1 Col2 Row1 a b Row2 c d Row3 e f Row4 g h And if you can tell us what the row and column variables are, even better. For example, if the variable with 4 categories is ordinal in nature, you might be able to use the test of linear-by-linear association that appears in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.) HTH. Andy W wrote > Ok I will bite. It is the expected frequencies that matter, not the > observed. So are the expected frequencies above 5? > > Note that many believe the 5 rule is quite conservative, see > http://stats.stackexchange.com/a/14230/1036 for an overview by Frank > Harrell. > > When the expected frequencies are very low, often people group > different rows/columns, as opposed to doing different tests. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729 243.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Marta Garcia-Granero
Good point Marta, R is pretty easy for this. (And of course can be called directly within SPSS.)
############### x <- matrix(c(5,1,8,9,10,12,2,16),nrow=4) x fisher.test(x) ############### Most situations I see people collapse categories seems pretty reasonable to me. I believe it can increase power, and I rather the test not be influenced by a minority of the observations. (Which seems like contradictory statements to me off-hand, so I might need to do some simulations and check those things for myself! Cognitive dissonance.) |
Administrator
|
In reply to this post by Dominic Lusinchi
Dominic, I don't understand why you're saying Howell did not interpret the results. Here are the results of his analysis:
Test Value df p Pearson Chi-Square 9.459 4 .051 Linear-by-Linear Association 5.757 1 .016 Deviation from Linearity 3.702 3 .296 He says that the linear component of the overall (Pearson) Chi-square is statistically significant (p = .016), the non-linear component is not (p = .296), and therefore, concludes that the percentage who drop out increases as the number of traumatic events goes up. (True, his graph actually shows proportion dropping out whereas the label says percentage--but that hardly invalidates his analysis.) So he, like you, concludes that Jennifer Mahon's hypothesis is supported. What am I missing? Here is syntax to generate Howell's results, by the way. * Analyze the ordinal chi-square problem shown on Dave Howell's website: http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html. NEW FILE. DATASET CLOSE all. DATA LIST list / DropOut Events Observations (3F5.0). BEGIN DATA 1 0 25 1 1 13 1 2 9 1 3 10 1 4 6 2 0 31 2 1 21 2 2 6 2 3 2 2 4 3 END DATA. DATASET NAME raw. VARIABLE LABELS Events "# of Traumatic Events". VALUE LABELS Events 4 "4+" / Dropout 1 "Drop out" 2 "Remain" . * OMS. DATASET DECLARE XTABS. OMS /SELECT TABLES /IF COMMANDS=['Crosstabs'] SUBTYPES=['Chi Square Tests'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='XTABS'. WEIGHT by Observations. CROSSTABS Events by Dropout /CELLS=count row /STATISTICS=BTAU CTAU D CHISQ /BARCHART. OMSEND. DATASET ACTIVATE XTABS. RENAME VARIABLES (Var1 Asymp.Sig.2sided = Test p). DO IF $CASENUM EQ 4. - COMPUTE Value = LAG(Value,3)-LAG(Value). - COMPUTE df = LAG(df,3)-LAG(df). - COMPUTE p = 1 - CDF.CHISQ(Value,df). - COMPUTE Percent = Value/LAG(Value,3)*100. - COMPUTE Test = "Deviation from Linearity". ELSE IF $Casenum EQ 3. - COMPUTE Percent = Value/LAG(Value,2)*100. ELSE IF $Casenum EQ 1. - COMPUTE Percent = 100. END IF. FORMATS Percent(F8.1) / p(F6.3). ALTER TYPE Test (A30). TEMPORARY. SELECT IF Test NE "Likelihood Ratio". LIST Test to Percent. * Percent column shows percentage of overall (Pearson Chi-square). HTH. p.s. - Apologies to the OP, as this thread has headed off on a bit of a tangent not necessarily all that closely related to the original question!
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by George. J. Pappas
Thank you all for your time and answers! I believe that rearranging your columns and rows in an attempt to increase your expected frequencies (>5) is always a good idea, but it is not always feasible. I wasn’t aware of that one could use the test of linear-by-linear when your categorical variable has in fact an ordinal nature. This seems like a good idea, but doesn’t the rule of >5 also apply in this case? There are numerous sites, where one can calculate the fisher test for RXC tables such as http://vassarstats.net and http://in-silico.net/tools/statistics/fisher_exact_test , but it seems that one can also calculate the test with SPSS by clicking the exact option in the crosstab window.
So if the result is significant, is there a way of finding out which aspect has caused it to be significant? Besides creating 2x2 table for each factor and using the Bonferroni correction, is there a different way? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
If your interest is to see what cells contribute to the chi-square then examine the standardized residuals [sresid option in cell stats] and/or the adjusted standardized residuals [asresid option in cell stats] ... mark miller On Fri, Apr 17, 2015 at 9:43 PM, George. J. Pappas <[hidden email]> wrote: Thank you all for your time and answers! I believe that rearranging your columns and rows in an attempt to increase your expected frequencies (>5) is always a good idea, but it is not always feasible. I wasn’t aware of that one could use the test of linear-by-linear when your categorical variable has in fact an ordinal nature. This seems like a good idea, but doesn’t the rule of >5 also apply in this case? There are numerous sites, where one can calculate the fisher test for RXC tables such as http://vassarstats.net and http://in-silico.net/tools/statistics/fisher_exact_test , but it seems that one can also calculate the test with SPSS by clicking the exact option in the crosstab window. |
Administrator
|
In reply to this post by George. J. Pappas
Requiring all expected counts (E) to be 5 or more is unnecessarily strict for tables larger than 2x2. In that case, one common rule of thumb (for use of Pearson's Chi-square) is that, "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734).
HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Going back to grouping rows and/or columns, in my case to keep tables to a manageable size rather than cell sizes to 5 or more, check out
4.2.1 Income differences – Statistical significance (draft only) http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/4.2.1_income_differences__statistical_significance.pdf Demonstration, using a two-way contingency table from CROSSTABS, to test the null hypothesis that there is no difference between the earnings (from paid work) of men and women. Step-by-step procedure to produce expected cell values (E) compare them to observed values (O) and gradually build up the formula for chi-square. John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: 18 April 2015 14:25 To: [hidden email] Subject: Re: CHI SQUARE SPSS Requiring all expected counts (E) to be 5 or more is unnecessarily strict for tables larger than 2x2. In that case, one common rule of thumb (for use of Pearson's Chi-square) is that, "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734). HTH. George. J. Pappas wrote > Thank you all for your time and answers! I believe that rearranging > your columns and rows in an attempt to increase your expected > frequencies (>5) is always a good idea, but it is not always feasible. > I wasn’t aware of that one could use the test of linear-by-linear when > your categorical variable has in fact an ordinal nature. This seems > like a good idea, but doesn’t the rule of >5 also apply in this case? > There are numerous sites, where one can calculate the fisher test for > RXC tables such as http://vassarstats.net and > http://in-silico.net/tools/statistics/fisher_exact_test , but it seems > that one can also calculate the test with SPSS by clicking the exact > option in the crosstab window. > So if the result is significant, is there a way of finding out which > aspect has caused it to be significant? Besides creating 2x2 table for > each factor and using the Bonferroni correction, is there a different way? > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729260.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |