SPSSX Discussion

Chi-Square, adjusted residual and multiple comparison

Classic

List

Threaded

5 messages Options

Tom

Chi-Square, adjusted residual and multiple comparison

Hi everybody

First question:

Given is a categorical variable “experience in years” with 5 categories. In order to compare the frequencies of “Yes” in these five groups I might get a significant chi-square. This means, if I’m right, that is at least one group with unexpected number of “Yes” – but nothing else. So, in order to figure out, which group has an unexpected number of “Yes” I’m looking at the adjusted residual. A rule of thumb seems to be: an adjusted residual greater or equal 2 is significant. But what means that exactly? I suppose: this group (or these groups) has a significant greater number of “Yes” than the other groups. Is that correct?

If it is, I got no information against which group(s) this is significant, so I should perform a multiple comparison, which leads to the second question.

Second question:

Another possibility is a multiple comparison, as Bruce Weaver suggested on an earlier question on this, with this syntax for a variable with 3 categories:

*******Read in the data, column 1 the categorical variable, 2 yes-no, 3 the frequencies.

data list list / r c kount (3f5.0).

begin data

1 1 12

1 2 8

2 1 129

2 2 41

3 1 456

3 2 524

end data.

weight by kount.

crosstabs r by c / stat = chisq /cells = count row.

******* Row 1 vs Row 3.

temporary.

select if any(r,1,3). /* Omit row 2 .

crosstabs r by c / stat = chisq /cells = count row.

******* Row 2 vs Row 3.

temporary.

select if any(r,2,3). /* Omit row 1 .

crosstabs r by c / stat = chisq /cells = count row.

****** 1 vs 2.

temporary.

select if any(r,1,2). /* Omit row 3 .

crosstabs r by c / stat = chisq /cells = count row.

********Rows 1&2 pooled vs Row 3.

temporary.

recode r (2=1). /* Temporarily recode 2 to 1.

crosstabs r by c / stat = chisq /cells = count row.

I’d like to compare the solutions with the adjusted residual (ie. Non significant chi, but high value of adj. res.), but I’m not capable to change the second lines in the syntax above.

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes the syntax?

Thanks for your help!

Tom

Marta Garcia-Granero

Re: Chi-Square, adjusted residual and multiple comparison

Hi Tom:

You might also try Marascuilo procedure (http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm) or, if you have CTABLES module installed, CTABLES with Bonferroni adjustment:

* NIST/SEMATECH SAMPLE DATA (replace by your own) *.
DATA LIST LIST/Lot Result Count(3 F8).
BEGIN DATA
1 0 36
1 1 264
2 0 46
2 1 254
3 0 42
3 1 258
4 0 63
4 1 237
5 0 38
5 1 262
END DATA.
WEIGHT BY Count .
VALUE LABEL Result 0'Non Conformant' 1'Conformant'.
VALUE LABEL Lot 1'G1' 2'G2' 3'G3' 4'G4' 5'G5'.

* A) Using CTABLES with Bonferroni adjustment *.
CTABLES
/VLABELS VARIABLES=Lot Result DISPLAY=DEFAULT
/TABLE Result [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Lot
/SLABELS POSITION=ROW
/CATEGORIES VARIABLES=Lot Result ORDER=A KEY=VALUE EMPTY=INCLUDE
/COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

* B) MARASCUILO PROCEDURE *.

* Don't change anything here *.
DATASET NAME Data.
DATASET DECLARE Results1 WINDOW=HIDDEN.
DATASET DECLARE Results2 WINDOW=HIDDEN.
DATASET DECLARE Contingency.
OMS /SELECT TABLES
/IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Crosstabulation"]
/DESTINATION FORMAT = SAV
OUTFILE = Contingency.
OMS /SELECT TABLES
/IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Case Processing Summary"]
/DESTINATION VIEWER = NO.

* Replace "Result" & "Lot" by your variable names *.
CROSSTABS
/TABLES=Result BY Lot
/FORMAT= AVALUE TABLES
/STATISTIC=CHISQ
/CELLS= COUNT COLUMN
/COUNT ROUND CELL .

* Don't change anything from here *.
OMSEND.
DATASET ACTIVATE Contingency.
COMPUTE Id=$casenum/2.
EXE.
SELECT IF (Id NE TRUNC(Id)).
EXE.
DELETE VARIABLES Command_ TO Var3 total Id.
PRESERVE.

* Eliminate RESULTS=NONE if you want the matrix output (but leave SET MXLOOPS=200 untouched) *.
SET MXLOOPS=200 RESULTS=NONE.
MATRIX.
PRINT /TITLE='MARASCUILO PROCEDURE FOR MULTIPLE PROPORTIONS'.
GET Data /VAR=ALL /NAMES=vnames.
PRINT Data
/CNAMES=vnames
/RLABELS='Row 1','Row 2','Total'
/TITLE='Input data'.
COMPUTE K=NCOL(Data).
COMPUTE P=Data(1,:)&/Data(3,:).
COMPUTE PLabels={'P(1)','P(2)','P(3)','P(4)','P(5)','P(6)','P(7)','P(8)','P(9)','P(10)','P(11)',
                 'P(12)','P(13)','P(14)','P(15)','P(16)','P(17)','P(18)','P(19)','P(20)'}.
PRINT P
/FORMAT='F8.3'
/CNAMES=PLabels
/TITLE='Proportions to be compared'.
COMPUTE N=Data(3,:).
* Critical Chi-square values for up to k=20 *.
COMPUTE Chi2={ 3.8415, 5.9915, 7.8147, 9.4877,11.0705,12.5916,14.0671,15.5073,16.9190,18.3070,
              19.6751,21.0261,22.3620,23.6848,24.9958,26.2962,27.5871,28.8693,30.1435,31.4104}.
COMPUTE Chi2Val=Chi2(K-1).
COMPUTE NComp=K*(K-1)/2.
COMPUTE Rij=    MAKE(NComp,1,0).
COMPUTE ABSDiff=MAKE(NComp,1,0).
COMPUTE Labels= MAKE(Ncomp,3," ").
COMPUTE Sig=    MAKE(NComp,1,'(ns)').
COMPUTE Index=1.
LOOP i=1 TO K-1.
. LOOP j=i+1 TO K.
. COMPUTE ABSDiff(Index)=ABS(P(i)-P(j)).
. COMPUTE Rij(Index)=SQRT(Chi2Val)*SQRT(P(i)*(1-P(i))/N(i)+P(j)*(1-P(j))/N(j)).
. COMPUTE Labels(Index,:)={PLabels(i),"-",PLabels(j)}.
. DO IF (ABSDiff(Index) GT Rij(Index)).
.   COMPUTE Sig(Index)='(*)'.
. END IF.
. COMPUTE Index=Index+1.
. END LOOP.
END LOOP.
PRINT {ABSDiff,Rij}
/FORMAT='F8.3'
/CLABELS='|Pi-Pj|','Cr. Range'
/TITLE='Absolute differences and their ranges'.
PRINT {Labels,Sig}
/FORMAT='A4'
/TITLE='Contrasts and their significance'.
SAVE {ABSDiff,Rij}/OUTFILE=Results1 /VARIABLES=Diffs,Rij.
SAVE {Labels,Sig} /OUTFILE=Results2 /VARIABLES=Labels1 TO Labels3,Sig /STRINGS=Labels1 TO Labels3,Sig.
END MATRIX.
RESTORE.
DATASET ACTIVATE Results2.
DATASET CLOSE Contingency.
STRING Comp(A9).
COMPUTE Comp=CONCAT(RTRIM(Labels1),RTRIM(Labels2),RTRIM(Labels3)).
MATCH FILES /FILE=*
/FILE='Results1'.
DATASET CLOSE Results1.
VAR LABEL Comp '|Pi)-P(j)|' Diffs'Absolute Difference' Rij'Critical Range' Sig 'Significance'.
FORMAT Diffs Rij (F8.3).
OMS /SELECT TABLES
/IF COMMANDS = ["Summarize"]
     SUBTYPES = ["Case Processing Summary"]
/DESTINATION VIEWER = NO.
SUMMARIZE
/TABLES=Comp Diffs Rij Sig
/FORMAT=LIST NOCASENUM NOTOTAL
/TITLE='Multiple comparisons: Marascuilo procedure'
/MISSING=VARIABLE
/CELLS=NONE.
OMSEND.
DATASET ACTIVATE Data.
DATASET CLOSE Results2.

HTH,
Marta GG

First question:

Given is a categorical variable “experience in years” with 5 categories. In order to compare the frequencies of “Yes” in these five groups I might get a significant chi-square. This means, if I’m right, that is at least one group with unexpected number of “Yes” – but nothing else. So, in order to figure out, which group has an unexpected number of “Yes” I’m looking at the adjusted residual. A rule of thumb seems to be: an adjusted residual greater or equal 2 is significant. But what means that exactly? I suppose: this group (or these groups) has a significant greater number of “Yes” than the other groups. Is that correct?

If it is, I got no information against which group(s) this is significant, so I should perform a multiple comparison, which leads to the second question.

Second question:

Another possibility is a multiple comparison, as Bruce Weaver suggested on an earlier question on this, with this syntax for a variable with 3 categories:

*******Read in the data, column 1 the categorical variable, 2 yes-no, 3 the frequencies.

data list list / r c kount (3f5.0).

begin data

1 1 12

1 2 8

2 1 129

2 2 41

3 1 456

3 2 524

end data.

weight by kount.

crosstabs r by c / stat = chisq /cells = count row.

******* Row 1 vs Row 3.

temporary.

select if any(r,1,3). /* Omit row 2 .

crosstabs r by c / stat = chisq /cells = count row.

******* Row 2 vs Row 3.

temporary.

select if any(r,2,3). /* Omit row 1 .

crosstabs r by c / stat = chisq /cells = count row.

****** 1 vs 2.

temporary.

select if any(r,1,2). /* Omit row 3 .

crosstabs r by c / stat = chisq /cells = count row.

********Rows 1&2 pooled vs Row 3.

temporary.

recode r (2=1). /* Temporarily recode 2 to 1.

crosstabs r by c / stat = chisq /cells = count row.

I’d like to compare the solutions with the adjusted residual (ie. Non significant chi, but high value of adj. res.), but I’m not capable to change the second lines in the syntax above.

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes the syntax?

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

Tom

AW: Re: Chi-Square, adjusted residual and multiple comparison

Thanks, Marta

Yes, that helps. I tried CTables with Bonferroni adjustement – but got an empty table with my Counts (which means, no significant comparisons). In your example seems to be a significant difference in the proportion between Lot g1 (A) and G4 (D). Am I right with this interpretation?

But: where are the values of the comparison? The table just shows the letters of the categories from the significant comparison. I do not really understand what CTABLES with Bonferroni adjustment really does. Maybe you have a literature reference on that?

The Marascuilo procedure seems to be statistically quiet different – and more complicated. I’ll try to get a little bit more familiar with it…

Tom

____________________________________

Thomas Balmer

Wissenschaftlicher Mitarbeiter
PHBern
Institut für Weiterbildung

Weltistrasse 40

CH-3006 Bern

T +41 31 309 27 36

[hidden email]

http://www.phbern.ch/weiterbildung

Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Marta García-Granero
Gesendet: Donnerstag, 11. November 2010 15:59
An: [hidden email]
Betreff: Re: Chi-Square, adjusted residual and multiple comparison

First question:

If it is, I got no information against which group(s) this is significant, so I should perform a multiple comparison, which leads to the second question.

Second question:

Another possibility is a multiple comparison, as Bruce Weaver suggested on an earlier question on this, with this syntax for a variable with 3 categories:

*******Read in the data, column 1 the categorical variable, 2 yes-no, 3 the frequencies.

data list list / r c kount (3f5.0).

begin data

1 1 12

1 2 8

2 1 129

2 2 41

3 1 456

3 2 524

end data.

weight by kount.

crosstabs r by c / stat = chisq /cells = count row.

******* Row 1 vs Row 3.

temporary.

select if any(r,1,3). /* Omit row 2 .

crosstabs r by c / stat = chisq /cells = count row.

******* Row 2 vs Row 3.

temporary.

select if any(r,2,3). /* Omit row 1 .

crosstabs r by c / stat = chisq /cells = count row.

****** 1 vs 2.

temporary.

select if any(r,1,2). /* Omit row 3 .

crosstabs r by c / stat = chisq /cells = count row.

********Rows 1&2 pooled vs Row 3.

temporary.

recode r (2=1). /* Temporarily recode 2 to 1.

crosstabs r by c / stat = chisq /cells = count row.

I’d like to compare the solutions with the adjusted residual (ie. Non significant chi, but high value of adj. res.), but I’m not capable to change the second lines in the syntax above.

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes the syntax?

--

For miscellaneous SPSS related statistical stuff, visit:

http://gjyp.nl/marta/

Bruce Weaver

Re: Chi-Square, adjusted residual and multiple comparison

Administrator

In reply to this post by Tom

Balmer Thomas wrote

--- snip ---

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes
the syntax?

--- snip ---

That SELECT IF *keeps* rows 1 and 3. So it omits row 2, because there are only 3 rows in that example. To omit the 3 rows you wish to omit, KEEP the two that you want. (Don't forget the TEMPORARY on the line before SELECT IF--without it, you will permanently delete the unselected rows.)

HTH.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Josep Grau Valldosera

Re: Chi-Square, adjusted residual and multiple comparison

In reply to this post by Marta Garcia-Granero

Dear Marta,

I've used both the Bonferroni and Marascuilo processes that you suggested some time ago with some data I'm analyzing concerning my research about students dropout at the Universitat Oberta de Catalunya.

These are the data:

Programme Dropout Count
Business Sci. dropout 9132
Business Sci. non dropout 7686
Tech. Eng. in CM dropout 3629
Tech. Eng. in CM non dropout 1803
Tech. Eng in CS dropout 4917
Tech. Eng in CS non dropout 2579
Tourism dropout 939
Tourism non dropout 950
Catalan Language dropout 703
Catalan Language non dropout 491
Law dropout 3320
Law non dropout 2829
Humanities dropout 3470
Humanities non dropout 1926
Psychology dropout 4336
Psychology non dropout 3338

"Concentrating" into the Marascuilo results, I've some doubts in relation to their interpretation: Specifically,
- I would like to know which is the equivalence between the programmes and the "numbers"... P(1) would correspond to the probability of dropout for the first column (programme) in the chi-square table (Business Sci) - as can be found in the "bonferroni-marascuilo.spv" file attached-?
- analyzing the results, these are the number of times each programme has a "significative" difference:
1 = 4 times
2 = 3 times
3 = 4 times
4 = 3 times
5 = 3 times
6 = 5 times
7 = 5 times
8 = 6 times
....should I interpret this as the programmes 8,7 and 6 have a more significative probability of dropping out than the rest?

I attach some files that I think can be useful:
xi-quadrat_per_programa-1er_cicle.sav
Marascuilo.sps
bonferroni-marascuilo.spv

Thank you very much in advance

Josep Grau
Staff at the Market Analysis Department
Student at the Doctoral Programme in Education and ICT (e-learning)