SPSSX Discussion

CHI SQUARE SPSS

Classic

List

Threaded

11 messages Options

George. J. Pappas

CHI SQUARE SPSS

Hello everyone

I have to compare two sets of categorical data in a 2×4 table using SPSS. Most of the cells contain values less than five. SPSS will only do a Fisher’s Exact for a 2×2 table. Is likelihood ratio an acceptable alternative of Pearson Chi-Square ? Is there any other way using SPSS ?

Thank you all for your help in advance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Andy W

Re: CHI SQUARE SPSS

Ok I will bite. It is the expected frequencies that matter, not the observed. So are the expected frequencies above 5?

Note that many believe the 5 rule is quite conservative, see http://stats.stackexchange.com/a/14230/1036 for an overview by Frank Harrell.

When the expected frequencies are very low, often people group different rows/columns, as opposed to doing different tests.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Bruce Weaver

Re: CHI SQUARE SPSS

Administrator

I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how about giving us the 8 cell counts?

Col1 Col2
Row1 a b
Row2 c d
Row3 e f
Row4 g h

And if you can tell us what the row and column variables are, even better. For example, if the variable with 4 categories is ordinal in nature, you might be able to use the test of linear-by-linear association that appears in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.)

HTH.

Andy W wrote

Ok I will bite. It is the expected frequencies that matter, not the observed. So are the expected frequencies above 5?

Note that many believe the 5 rule is quite conservative, see http://stats.stackexchange.com/a/14230/1036 for an overview by Frank Harrell.

When the expected frequencies are very low, often people group different rows/columns, as opposed to doing different tests.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Marta Garcia-Granero

Re: CHI SQUARE SPSS

A third possibility is to look for freeware applications (like WinPepi)
that can compute Fisher-Freeman-Halton test on 2xk tables, like this
one. There is also an Excel macro (fishchi.xls), very slow, that can
handle small frequencies in tables bigger than 2x2.

I'm a great admirer of WinPepi, it's loaded with a lot of handy methods.

My two cents,
Marta GG

El 16/04/2015 a las 23:24, Bruce Weaver escribió:

> I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how
> about giving us the 8 cell counts?
>
> Col1 Col2
> Row1 a b
> Row2 c d
> Row3 e f
> Row4 g h
>
> And if you can tell us what the row and column variables are, even better.
> For example, if the variable with 4 categories is ordinal in nature, you
> might be able to use the test of linear-by-linear association that appears
> in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see
> http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.)
>
> HTH.
>
>
>
>
> Andy W wrote
>> Ok I will bite. It is the expected frequencies that matter, not the
>> observed. So are the expected frequencies above 5?
>>
>> Note that many believe the 5 rule is quite conservative, see
>> http://stats.stackexchange.com/a/14230/1036 for an overview by Frank
>> Harrell.
>>
>> When the expected frequencies are very low, often people group different
>> rows/columns, as opposed to doing different tests.
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729243.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Dominic Lusinchi

Re: CHI SQUARE SPSS

In reply to this post by George. J. Pappas

I don't find Howell's explanation very useful - furthermore, he does not
interpret the results. What does the M2 statistic (linear-by-linear
association) tell us with regard to the problem at hand? Do people who have
experienced more traumas drop-out at a higher rate or don't they? On top of
it, the table on Howell's page does not show percentages. When we add
percentages, we see that the rate of drop-outs generally tends to increase
as the number of traumas increase; or the proportion that remain in
treatment decrease as the number of traumas increases, generally speaking.
His bar chart is badly labeled: the vertical axis label reads "Percentage
Dropout", but the numbers by the tick marks are proportions (?).
It seems to me that Kendall's tau (which Howell mentions in passing) or
Somers' d are much more useful statistics for this sort of situation, do you
not think? They are both directional - a useful piece of information: the
association between the two variables is negative. And they were
specifically designed for ordinal contingency tables. Somers' d is a bit
tricky here (perhaps I should say a bit counter-intuitive) because it tells
me that when trauma is used as the explanatory variable, the association
between the two variables is weaker (-.142) than when drop-out is used as
the independent variable (-.200) - although the difference is not very
large.
In any case, it would seem that Jennifer Mahon's hypothesis is correct. Or
am I missing something?
Cheers - Dominic

*********************************************
Dominic Lusinchi
Far West Research Consulting
Applied Statistics - Social Research - Sociology
San Francisco, California
[hidden email]
1-415-664-3032
CV: http://www.farwestresearch.com/staff/dl/dlcv.html
*********************************************

-----Original Message-----
From: Bruce Weaver [mailto:[hidden email]]
Sent: Thursday, April 16, 2015 2:24 PM
Subject: Re: CHI SQUARE SPSS

I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how
about giving us the 8 cell counts?

Col1 Col2
Row1 a b
Row2 c d
Row3 e f
Row4 g h

And if you can tell us what the row and column variables are, even better.
For example, if the variable with 4 categories is ordinal in nature, you
might be able to use the test of linear-by-linear association that appears
in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see
http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.)

HTH.

Andy W wrote
> Ok I will bite. It is the expected frequencies that matter, not the
> observed. So are the expected frequencies above 5?
>
> Note that many believe the 5 rule is quite conservative, see
> http://stats.stackexchange.com/a/14230/1036 for an overview by Frank
> Harrell.
>
> When the expected frequencies are very low, often people group
> different rows/columns, as opposed to doing different tests.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729
243.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Andy W

Re: CHI SQUARE SPSS

In reply to this post by Marta Garcia-Granero

Good point Marta, R is pretty easy for this. (And of course can be called directly within SPSS.)

###############
x <- matrix(c(5,1,8,9,10,12,2,16),nrow=4)
x
fisher.test(x)
###############

Most situations I see people collapse categories seems pretty reasonable to me. I believe it can increase power, and I rather the test not be influenced by a minority of the observations. (Which seems like contradictory statements to me off-hand, so I might need to do some simulations and check those things for myself! Cognitive dissonance.)

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Bruce Weaver

Re: CHI SQUARE SPSS

Administrator

In reply to this post by Dominic Lusinchi

Dominic, I don't understand why you're saying Howell did not interpret the results. Here are the results of his analysis:

Test Value df p

Pearson Chi-Square 9.459 4 .051
Linear-by-Linear Association 5.757 1 .016
Deviation from Linearity 3.702 3 .296

He says that the linear component of the overall (Pearson) Chi-square is statistically significant (p = .016), the non-linear component is not (p = .296), and therefore, concludes that the percentage who drop out increases as the number of traumatic events goes up. (True, his graph actually shows proportion dropping out whereas the label says percentage--but that hardly invalidates his analysis.) So he, like you, concludes that Jennifer Mahon's hypothesis is supported. What am I missing?

Here is syntax to generate Howell's results, by the way.

* Analyze the ordinal chi-square problem shown
on Dave Howell's website:
http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.

NEW FILE.
DATASET CLOSE all.

DATA LIST list / DropOut Events Observations (3F5.0).
BEGIN DATA
1 0 25
1 1 13
1 2 9
1 3 10
1 4 6
2 0 31
2 1 21
2 2 6
2 3 2
2 4 3
END DATA.
DATASET NAME raw.

VARIABLE LABELS Events "# of Traumatic Events".
VALUE LABELS
Events 4 "4+" /
Dropout 1 "Drop out" 2 "Remain"
.

* OMS.
DATASET DECLARE XTABS.
OMS
/SELECT TABLES
/IF COMMANDS=['Crosstabs'] SUBTYPES=['Chi Square Tests']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='XTABS'.

WEIGHT by Observations.
CROSSTABS Events by Dropout
/CELLS=count row /STATISTICS=BTAU CTAU D CHISQ /BARCHART.

OMSEND.

DATASET ACTIVATE XTABS.
RENAME VARIABLES (Var1 Asymp.Sig.2sided = Test p).
DO IF $CASENUM EQ 4.
- COMPUTE Value = LAG(Value,3)-LAG(Value).
- COMPUTE df = LAG(df,3)-LAG(df).
- COMPUTE p = 1 - CDF.CHISQ(Value,df).
- COMPUTE Percent = Value/LAG(Value,3)*100.
- COMPUTE Test = "Deviation from Linearity".
ELSE IF $Casenum EQ 3.
- COMPUTE Percent = Value/LAG(Value,2)*100.
ELSE IF $Casenum EQ 1.
- COMPUTE Percent = 100.
END IF.

FORMATS Percent(F8.1) / p(F6.3).
ALTER TYPE Test (A30).
TEMPORARY.
SELECT IF Test NE "Likelihood Ratio".
LIST Test to Percent.
* Percent column shows percentage of overall (Pearson Chi-square).

HTH.

p.s. - Apologies to the OP, as this thread has headed off on a bit of a tangent not necessarily all that closely related to the original question!

Dominic Lusinchi wrote

I don't find Howell's explanation very useful - furthermore, he does not
interpret the results. What does the M2 statistic (linear-by-linear
association) tell us with regard to the problem at hand? Do people who have
experienced more traumas drop-out at a higher rate or don't they? On top of
it, the table on Howell's page does not show percentages. When we add
percentages, we see that the rate of drop-outs generally tends to increase
as the number of traumas increase; or the proportion that remain in
treatment decrease as the number of traumas increases, generally speaking.
His bar chart is badly labeled: the vertical axis label reads "Percentage
Dropout", but the numbers by the tick marks are proportions (?).
It seems to me that Kendall's tau (which Howell mentions in passing) or
Somers' d are much more useful statistics for this sort of situation, do you
not think? They are both directional - a useful piece of information: the
association between the two variables is negative. And they were
specifically designed for ordinal contingency tables. Somers' d is a bit
tricky here (perhaps I should say a bit counter-intuitive) because it tells
me that when trauma is used as the explanatory variable, the association
between the two variables is weaker (-.142) than when drop-out is used as
the independent variable (-.200) - although the difference is not very
large.
In any case, it would seem that Jennifer Mahon's hypothesis is correct. Or
am I missing something?
Cheers - Dominic

*********************************************
Dominic Lusinchi
Far West Research Consulting
Applied Statistics - Social Research - Sociology
San Francisco, California
[hidden email]
1-415-664-3032
CV: http://www.farwestresearch.com/staff/dl/dlcv.html
*********************************************

-----Original Message-----
From: Bruce Weaver [mailto:[hidden email]]
Sent: Thursday, April 16, 2015 2:24 PM
Subject: Re: CHI SQUARE SPSS

I'll bite too--or at least nibble a bit. As it is just a 2x4 table, how
about giving us the 8 cell counts?

Col1 Col2
Row1 a b
Row2 c d
Row3 e f
Row4 g h

And if you can tell us what the row and column variables are, even better.
For example, if the variable with 4 categories is ordinal in nature, you
might be able to use the test of linear-by-linear association that appears
in the CROSSTABS output when you set /STAT=CHISQ. (For more info, see
http://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html.)

HTH.

Andy W wrote
> Ok I will bite. It is the expected frequencies that matter, not the
> observed. So are the expected frequencies above 5?
>
> Note that many believe the 5 rule is quite conservative, see
> http://stats.stackexchange.com/a/14230/1036 for an overview by Frank
> Harrell.
>
> When the expected frequencies are very low, often people group
> different rows/columns, as opposed to doing different tests.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729
243.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

George. J. Pappas

Re: CHI SQUARE SPSS

In reply to this post by George. J. Pappas

Thank you all for your time and answers! I believe that rearranging your columns and rows in an attempt to increase your expected frequencies (>5) is always a good idea, but it is not always feasible. I wasn’t aware of that one could use the test of linear-by-linear when your categorical variable has in fact an ordinal nature. This seems like a good idea, but doesn’t the rule of >5 also apply in this case? There are numerous sites, where one can calculate the fisher test for RXC tables such as http://vassarstats.net and http://in-silico.net/tools/statistics/fisher_exact_test , but it seems that one can also calculate the test with SPSS by clicking the exact option in the crosstab window.
So if the result is significant, is there a way of finding out which aspect has caused it to be significant? Besides creating 2x2 table for each factor and using the Bonferroni correction, is there a different way?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mark Miller

Re: CHI SQUARE SPSS

If your interest is to see what cells contribute to the chi-square then examine

the standardized residuals [sresid option in cell stats]

and/or

the adjusted standardized residuals [asresid option in cell stats]

... mark miller

On Fri, Apr 17, 2015 at 9:43 PM, George. J. Pappas <[hidden email]> wrote:

Thank you all for your time and answers! I believe that rearranging your columns and rows in an attempt to increase your expected frequencies (>5) is always a good idea, but it is not always feasible. I wasn’t aware of that one could use the test of linear-by-linear when your categorical variable has in fact an ordinal nature. This seems like a good idea, but doesn’t the rule of >5 also apply in this case? There are numerous sites, where one can calculate the fisher test for RXC tables such as http://vassarstats.net and http://in-silico.net/tools/statistics/fisher_exact_test , but it seems that one can also calculate the test with SPSS by clicking the exact option in the crosstab window.
So if the result is significant, is there a way of finding out which aspect has caused it to be significant? Besides creating 2x2 table for each factor and using the Bonferroni correction, is there a different way?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Bruce Weaver

Re: CHI SQUARE SPSS

Administrator

In reply to this post by George. J. Pappas

Requiring all expected counts (E) to be 5 or more is unnecessarily strict for tables larger than 2x2. In that case, one common rule of thumb (for use of Pearson's Chi-square) is that, "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734).

HTH.

George. J. Pappas wrote

Thank you all for your time and answers! I believe that rearranging your columns and rows in an attempt to increase your expected frequencies (>5) is always a good idea, but it is not always feasible. I wasn’t aware of that one could use the test of linear-by-linear when your categorical variable has in fact an ordinal nature. This seems like a good idea, but doesn’t the rule of >5 also apply in this case? There are numerous sites, where one can calculate the fisher test for RXC tables such as http://vassarstats.net and http://in-silico.net/tools/statistics/fisher_exact_test , but it seems that one can also calculate the test with SPSS by clicking the exact option in the crosstab window.
So if the result is significant, is there a way of finding out which aspect has caused it to be significant? Besides creating 2x2 table for each factor and using the Bonferroni correction, is there a different way?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

John F Hall

Re: CHI SQUARE SPSS

Going back to grouping rows and/or columns, in my case to keep tables to a manageable size rather than cell sizes to 5 or more, check out

4.2.1 Income differences – Statistical significance (draft only)
http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/4.2.1_income_differences__statistical_significance.pdf

Demonstration, using a two-way contingency table from CROSSTABS, to test the null hypothesis that there is no difference between the earnings (from paid work) of men and women. Step-by-step procedure to produce expected cell values (E) compare them to observed values (O) and gradually build up the formula for chi-square.

John F Hall (Mr)
[Retired academic survey researcher]

Email: [hidden email]
Website: www.surveyresearch.weebly.com
SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: 18 April 2015 14:25
To: [hidden email]
Subject: Re: CHI SQUARE SPSS

Requiring all expected counts (E) to be 5 or more is unnecessarily strict for tables larger than 2x2. In that case, one common rule of thumb (for use of Pearson's Chi-square) is that, "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734).

HTH.

George. J. Pappas wrote

> Thank you all for your time and answers! I believe that rearranging
> your columns and rows in an attempt to increase your expected
> frequencies (>5) is always a good idea, but it is not always feasible.
> I wasn’t aware of that one could use the test of linear-by-linear when
> your categorical variable has in fact an ordinal nature. This seems
> like a good idea, but doesn’t the rule of >5 also apply in this case?
> There are numerous sites, where one can calculate the fisher test for
> RXC tables such as http://vassarstats.net and
> http://in-silico.net/tools/statistics/fisher_exact_test , but it seems
> that one can also calculate the test with SPSS by clicking the exact
> option in the crosstab window.
> So if the result is significant, is there a way of finding out which
> aspect has caused it to be significant? Besides creating 2x2 table for
> each factor and using the Bonferroni correction, is there a different way?
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/CHI-SQUARE-SPSS-tp5729237p5729260.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD