Chi-Square, adjusted residual and multiple comparison

classic Classic list List threaded Threaded
5 messages Options
Tom
Reply | Threaded
Open this post in threaded view
|

Chi-Square, adjusted residual and multiple comparison

Tom

Hi everybody

 

First question:

Given is a categorical variable “experience in years” with 5 categories. In order to compare the frequencies of “Yes” in these five groups I might get a significant chi-square. This means, if I’m right, that is at least one group with unexpected number of “Yes” – but nothing else. So, in order to figure out, which group has an unexpected number of “Yes” I’m looking at the adjusted residual. A rule of thumb seems to be: an adjusted residual greater or equal 2 is significant. But what means that exactly? I suppose: this group (or these groups) has a significant greater number of “Yes” than the other groups. Is that correct?

If it is, I got no information against which group(s) this is significant, so I should perform a multiple comparison, which leads to the second question.

 

Second question:

Another possibility is a multiple comparison, as Bruce Weaver suggested on an earlier question on this, with this syntax for a variable with 3 categories:

 

*******Read in the data, column 1 the categorical variable, 2 yes-no, 3 the frequencies.

data list list / r c kount (3f5.0).

begin data

1 1 12

1 2 8

2 1 129

2 2 41

3 1 456

3 2 524

end data.

 

weight by kount.

crosstabs r by c / stat = chisq /cells = count row.

 

******* Row 1 vs Row 3.

temporary.

select if any(r,1,3). /* Omit row 2 .

crosstabs r by c / stat = chisq /cells = count row.

 

******* Row 2 vs Row 3.

temporary.

select if any(r,2,3). /* Omit row 1 .

crosstabs r by c / stat = chisq /cells = count row.

 

****** 1 vs 2.

temporary.

select if any(r,1,2). /* Omit row 3 .

crosstabs r by c / stat = chisq /cells = count row.

 

********Rows 1&2 pooled vs Row 3.

temporary.

recode r (2=1). /* Temporarily recode 2 to 1.

crosstabs r by c / stat = chisq /cells = count row.

 

 

I’d like to compare the solutions with the adjusted residual (ie. Non significant chi, but high value of adj. res.), but I’m not capable to change the second lines in the syntax above.

 

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes the syntax?

 

Thanks for your help!

 

Tom

 

Reply | Threaded
Open this post in threaded view
|

Re: Chi-Square, adjusted residual and multiple comparison

Marta Garcia-Granero
Hi Tom:

You might also try Marascuilo procedure (http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm) or, if you have CTABLES module installed, CTABLES with Bonferroni adjustment:

* NIST/SEMATECH SAMPLE DATA (replace by your own)  *.
DATA LIST LIST/Lot Result Count(3 F8).
BEGIN DATA
1 0  36
1 1 264
2 0  46
2 1 254
3 0  42
3 1 258
4 0  63
4 1 237
5 0  38
5 1 262
END DATA.
WEIGHT BY Count .
VALUE LABEL Result 0'Non Conformant' 1'Conformant'.
VALUE LABEL Lot 1'G1' 2'G2' 3'G3' 4'G4' 5'G5'.

* A) Using CTABLES with Bonferroni adjustment *.
CTABLES
  /VLABELS VARIABLES=Lot Result DISPLAY=DEFAULT
  /TABLE Result [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Lot
  /SLABELS POSITION=ROW
  /CATEGORIES VARIABLES=Lot Result ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
  INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

* B) MARASCUILO PROCEDURE *.

* Don't change anything here *.
DATASET NAME Data.
DATASET DECLARE Results1 WINDOW=HIDDEN.
DATASET DECLARE Results2 WINDOW=HIDDEN.
DATASET DECLARE Contingency.
OMS /SELECT TABLES
 /IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Crosstabulation"]
 /DESTINATION FORMAT = SAV
  OUTFILE = Contingency.
OMS /SELECT TABLES
 /IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Case Processing Summary"]
 /DESTINATION VIEWER = NO.

* Replace "Result" & "Lot" by your variable names *.
CROSSTABS
  /TABLES=Result  BY Lot
  /FORMAT= AVALUE TABLES
  /STATISTIC=CHISQ
  /CELLS= COUNT COLUMN
  /COUNT ROUND CELL .

* Don't change anything from here *.
OMSEND.
DATASET ACTIVATE Contingency.
COMPUTE Id=$casenum/2.
EXE.
SELECT IF (Id NE TRUNC(Id)).
EXE.
DELETE VARIABLES Command_ TO Var3 total Id.
PRESERVE.

* Eliminate RESULTS=NONE if you want the matrix output (but leave SET MXLOOPS=200 untouched) *.
SET MXLOOPS=200 RESULTS=NONE.
MATRIX.
PRINT /TITLE='MARASCUILO PROCEDURE FOR MULTIPLE PROPORTIONS'.
GET Data /VAR=ALL /NAMES=vnames.
PRINT Data
 /CNAMES=vnames
 /RLABELS='Row 1','Row 2','Total'
 /TITLE='Input data'.
COMPUTE K=NCOL(Data).
COMPUTE P=Data(1,:)&/Data(3,:).
COMPUTE PLabels={'P(1)','P(2)','P(3)','P(4)','P(5)','P(6)','P(7)','P(8)','P(9)','P(10)','P(11)',
                 'P(12)','P(13)','P(14)','P(15)','P(16)','P(17)','P(18)','P(19)','P(20)'}.
PRINT P
 /FORMAT='F8.3'
 /CNAMES=PLabels
 /TITLE='Proportions to be compared'.
COMPUTE N=Data(3,:).
* Critical Chi-square values for up to k=20 *.
COMPUTE Chi2={ 3.8415, 5.9915, 7.8147, 9.4877,11.0705,12.5916,14.0671,15.5073,16.9190,18.3070,
              19.6751,21.0261,22.3620,23.6848,24.9958,26.2962,27.5871,28.8693,30.1435,31.4104}.
COMPUTE Chi2Val=Chi2(K-1).
COMPUTE NComp=K*(K-1)/2.
COMPUTE Rij=    MAKE(NComp,1,0).
COMPUTE ABSDiff=MAKE(NComp,1,0).
COMPUTE Labels= MAKE(Ncomp,3," ").
COMPUTE Sig=    MAKE(NComp,1,'(ns)').
COMPUTE Index=1.
LOOP i=1 TO K-1.
. LOOP j=i+1 TO K.
.  COMPUTE ABSDiff(Index)=ABS(P(i)-P(j)).
.  COMPUTE Rij(Index)=SQRT(Chi2Val)*SQRT(P(i)*(1-P(i))/N(i)+P(j)*(1-P(j))/N(j)).
.  COMPUTE Labels(Index,:)={PLabels(i),"-",PLabels(j)}.
.  DO IF (ABSDiff(Index) GT Rij(Index)).
.   COMPUTE Sig(Index)='(*)'.
.  END IF.
.  COMPUTE Index=Index+1.
. END LOOP.
END LOOP.
PRINT {ABSDiff,Rij}
 /FORMAT='F8.3'
 /CLABELS='|Pi-Pj|','Cr. Range'
 /TITLE='Absolute differences and their ranges'.
PRINT {Labels,Sig}
 /FORMAT='A4'
 /TITLE='Contrasts and their significance'.
SAVE {ABSDiff,Rij}/OUTFILE=Results1 /VARIABLES=Diffs,Rij.
SAVE {Labels,Sig} /OUTFILE=Results2 /VARIABLES=Labels1 TO Labels3,Sig /STRINGS=Labels1 TO Labels3,Sig.
END MATRIX.
RESTORE.
DATASET ACTIVATE Results2.
DATASET CLOSE Contingency.
STRING Comp(A9).
COMPUTE Comp=CONCAT(RTRIM(Labels1),RTRIM(Labels2),RTRIM(Labels3)).
MATCH FILES /FILE=*
 /FILE='Results1'.
DATASET CLOSE Results1.
VAR LABEL Comp '|Pi)-P(j)|' Diffs'Absolute Difference' Rij'Critical Range' Sig 'Significance'.
FORMAT Diffs Rij (F8.3).
OMS /SELECT TABLES
 /IF COMMANDS = ["Summarize"]
     SUBTYPES = ["Case Processing Summary"]
 /DESTINATION VIEWER = NO.
SUMMARIZE
  /TABLES=Comp Diffs Rij Sig
  /FORMAT=LIST NOCASENUM NOTOTAL
  /TITLE='Multiple comparisons: Marascuilo procedure'
  /MISSING=VARIABLE
  /CELLS=NONE.
OMSEND.
DATASET ACTIVATE Data.
DATASET CLOSE Results2.

HTH,
Marta GG


First question:

Given is a categorical variable “experience in years” with 5 categories. In order to compare the frequencies of “Yes” in these five groups I might get a significant chi-square. This means, if I’m right, that is at least one group with unexpected number of “Yes” – but nothing else. So, in order to figure out, which group has an unexpected number of “Yes” I’m looking at the adjusted residual. A rule of thumb seems to be: an adjusted residual greater or equal 2 is significant. But what means that exactly? I suppose: this group (or these groups) has a significant greater number of “Yes” than the other groups. Is that correct?

If it is, I got no information against which group(s) this is significant, so I should perform a multiple comparison, which leads to the second question.

Second question:

Another possibility is a multiple comparison, as Bruce Weaver suggested on an earlier question on this, with this syntax for a variable with 3 categories:

 *******Read in the data, column 1 the categorical variable, 2 yes-no, 3 the frequencies.

data list list / r c kount (3f5.0).

begin data

1 1 12

1 2 8

2 1 129

2 2 41

3 1 456

3 2 524

end data.

 

weight by kount.

crosstabs r by c / stat = chisq /cells = count row.

 

******* Row 1 vs Row 3.

temporary.

select if any(r,1,3). /* Omit row 2 .

crosstabs r by c / stat = chisq /cells = count row.

 

******* Row 2 vs Row 3.

temporary.

select if any(r,2,3). /* Omit row 1 .

crosstabs r by c / stat = chisq /cells = count row.

 

****** 1 vs 2.

temporary.

select if any(r,1,2). /* Omit row 3 .

crosstabs r by c / stat = chisq /cells = count row.

 

********Rows 1&2 pooled vs Row 3.

temporary.

recode r (2=1). /* Temporarily recode 2 to 1.

crosstabs r by c / stat = chisq /cells = count row.

 

 

I’d like to compare the solutions with the adjusted residual (ie. Non significant chi, but high value of adj. res.), but I’m not capable to change the second lines in the syntax above.

 

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes the syntax?




--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/
Tom
Reply | Threaded
Open this post in threaded view
|

AW: Re: Chi-Square, adjusted residual and multiple comparison

Tom

Thanks, Marta

 

Yes, that helps. I tried CTables with Bonferroni adjustement – but got an empty table with my Counts (which means, no significant comparisons). In your example seems to be a significant difference in the proportion between Lot g1 (A) and G4 (D). Am I right with this interpretation?

 

But: where are the values of the comparison? The table just shows the letters of the categories from the significant comparison. I do not really understand what CTABLES with Bonferroni adjustment really does. Maybe you have a literature reference on that?

 

The Marascuilo procedure seems to be statistically quiet different – and more complicated. I’ll try to get a little bit more familiar with it…

 

 Tom

 

 

____________________________________

 

Thomas Balmer

Wissenschaftlicher Mitarbeiter
PHBern
Institut für Weiterbildung

Weltistrasse 40

CH-3006 Bern

T +41 31 309 27 36

 

[hidden email]

http://www.phbern.ch/weiterbildung


Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Marta García-Granero
Gesendet: Donnerstag, 11. November 2010 15:59
An: [hidden email]
Betreff: Re: Chi-Square, adjusted residual and multiple comparison

 

Hi Tom:

You might also try Marascuilo procedure (http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm) or, if you have CTABLES module installed, CTABLES with Bonferroni adjustment:

* NIST/SEMATECH SAMPLE DATA (replace by your own)  *.
DATA LIST LIST/Lot Result Count(3 F8).
BEGIN DATA
1 0  36
1 1 264
2 0  46
2 1 254
3 0  42
3 1 258
4 0  63
4 1 237
5 0  38
5 1 262
END DATA.
WEIGHT BY Count .
VALUE LABEL Result 0'Non Conformant' 1'Conformant'.
VALUE LABEL Lot 1'G1' 2'G2' 3'G3' 4'G4' 5'G5'.

* A) Using CTABLES with Bonferroni adjustment *.
CTABLES
  /VLABELS VARIABLES=Lot Result DISPLAY=DEFAULT
  /TABLE Result [COUNT F40.0, COLPCT.COUNT PCT40.1] BY Lot
  /SLABELS POSITION=ROW
  /CATEGORIES VARIABLES=Lot Result ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN
  INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.

* B) MARASCUILO PROCEDURE *.

* Don't change anything here *.
DATASET NAME Data.
DATASET DECLARE Results1 WINDOW=HIDDEN.
DATASET DECLARE Results2 WINDOW=HIDDEN.
DATASET DECLARE Contingency.
OMS /SELECT TABLES
 /IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Crosstabulation"]
 /DESTINATION FORMAT = SAV
  OUTFILE = Contingency.
OMS /SELECT TABLES
 /IF COMMANDS = ["Crosstabs"]
     SUBTYPES = ["Case Processing Summary"]
 /DESTINATION VIEWER = NO.

* Replace "Result" & "Lot" by your variable names *.
CROSSTABS
  /TABLES=Result  BY Lot
  /FORMAT= AVALUE TABLES
  /STATISTIC=CHISQ
  /CELLS= COUNT COLUMN
  /COUNT ROUND CELL .

* Don't change anything from here *.
OMSEND.
DATASET ACTIVATE Contingency.
COMPUTE Id=$casenum/2.
EXE.
SELECT IF (Id NE TRUNC(Id)).
EXE.
DELETE VARIABLES Command_ TO Var3 total Id.
PRESERVE.

* Eliminate RESULTS=NONE if you want the matrix output (but leave SET MXLOOPS=200 untouched) *.
SET MXLOOPS=200 RESULTS=NONE.
MATRIX.
PRINT /TITLE='MARASCUILO PROCEDURE FOR MULTIPLE PROPORTIONS'.
GET Data /VAR=ALL /NAMES=vnames.
PRINT Data
 /CNAMES=vnames
 /RLABELS='Row 1','Row 2','Total'
 /TITLE='Input data'.
COMPUTE K=NCOL(Data).
COMPUTE P=Data(1,:)&/Data(3,:).
COMPUTE PLabels={'P(1)','P(2)','P(3)','P(4)','P(5)','P(6)','P(7)','P(8)','P(9)','P(10)','P(11)',
                 'P(12)','P(13)','P(14)','P(15)','P(16)','P(17)','P(18)','P(19)','P(20)'}
.
PRINT P
 /FORMAT='F8.3'
 /CNAMES=PLabels
 /TITLE='Proportions to be compared'.
COMPUTE N=Data(3,:).
* Critical Chi-square values for up to k=20 *.
COMPUTE Chi2={ 3.8415, 5.9915, 7.8147, 9.4877,11.0705,12.5916,14.0671,15.5073,16.9190,18.3070,
              19.6751,21.0261,22.3620,23.6848,24.9958,26.2962,27.5871,28.8693,30.1435,31.4104}
.
COMPUTE Chi2Val=Chi2(K-1).
COMPUTE NComp=K*(K-1)/2.
COMPUTE Rij=    MAKE(NComp,1,0).
COMPUTE ABSDiff=MAKE(NComp,1,0).
COMPUTE Labels= MAKE(Ncomp,3," ").
COMPUTE Sig=    MAKE(NComp,1,'(ns)').
COMPUTE Index=1.
LOOP i=1 TO K-1.
. LOOP j=i+1 TO K.
.  COMPUTE ABSDiff(Index)=ABS(P(i)-P(j)).
.  COMPUTE Rij(Index)=SQRT(Chi2Val)*SQRT(P(i)*(1-P(i))/N(i)+P(j)*(1-P(j))/N(j)).
.  COMPUTE Labels(Index,:)={PLabels(i),"-",PLabels(j)}.
.  DO IF (ABSDiff(Index) GT Rij(Index)).
.   COMPUTE Sig(Index)='(*)'.
.  END IF.
.  COMPUTE Index=Index+1.
. END LOOP.
END LOOP.
PRINT {ABSDiff,Rij}
 /FORMAT='F8.3'
 /CLABELS='|Pi-Pj|','Cr. Range'
 /TITLE='Absolute differences and their ranges'.
PRINT {Labels,Sig}
 /FORMAT='A4'
 /TITLE='Contrasts and their significance'.
SAVE {ABSDiff,Rij}/OUTFILE=Results1 /VARIABLES=Diffs,Rij.
SAVE {Labels,Sig} /OUTFILE=Results2 /VARIABLES=Labels1 TO Labels3,Sig /STRINGS=Labels1 TO Labels3,Sig.
END MATRIX.
RESTORE.
DATASET ACTIVATE Results2.
DATASET CLOSE Contingency.
STRING Comp(A9).
COMPUTE Comp=CONCAT(RTRIM(Labels1),RTRIM(Labels2),RTRIM(Labels3)).
MATCH FILES /FILE=*
 /FILE='Results1'.
DATASET CLOSE Results1.
VAR LABEL Comp '|Pi)-P(j)|' Diffs'Absolute Difference' Rij'Critical Range' Sig 'Significance'.
FORMAT Diffs Rij (F8.3).
OMS /SELECT TABLES
 /IF COMMANDS = ["Summarize"]
     SUBTYPES = ["Case Processing Summary"]
 /DESTINATION VIEWER = NO.
SUMMARIZE
  /TABLES=Comp Diffs Rij Sig
  /FORMAT=LIST NOCASENUM NOTOTAL
  /TITLE='Multiple comparisons: Marascuilo procedure'
  /MISSING=VARIABLE
  /CELLS=NONE.
OMSEND.
DATASET ACTIVATE Data.
DATASET CLOSE Results2.

HTH,
Marta GG



First question:

Given is a categorical variable “experience in years” with 5 categories. In order to compare the frequencies of “Yes” in these five groups I might get a significant chi-square. This means, if I’m right, that is at least one group with unexpected number of “Yes” – but nothing else. So, in order to figure out, which group has an unexpected number of “Yes” I’m looking at the adjusted residual. A rule of thumb seems to be: an adjusted residual greater or equal 2 is significant. But what means that exactly? I suppose: this group (or these groups) has a significant greater number of “Yes” than the other groups. Is that correct?

If it is, I got no information against which group(s) this is significant, so I should perform a multiple comparison, which leads to the second question.

Second question:

Another possibility is a multiple comparison, as Bruce Weaver suggested on an earlier question on this, with this syntax for a variable with 3 categories:

 *******Read in the data, column 1 the categorical variable, 2 yes-no, 3 the frequencies.

data list list / r c kount (3f5.0).

begin data

1 1 12

1 2 8

2 1 129

2 2 41

3 1 456

3 2 524

end data.

 

weight by kount.

crosstabs r by c / stat = chisq /cells = count row.

 

******* Row 1 vs Row 3.

temporary.

select if any(r,1,3). /* Omit row 2 .

crosstabs r by c / stat = chisq /cells = count row.

 

******* Row 2 vs Row 3.

temporary.

select if any(r,2,3). /* Omit row 1 .

crosstabs r by c / stat = chisq /cells = count row.

 

****** 1 vs 2.

temporary.

select if any(r,1,2). /* Omit row 3 .

crosstabs r by c / stat = chisq /cells = count row.

 

********Rows 1&2 pooled vs Row 3.

temporary.

recode r (2=1). /* Temporarily recode 2 to 1.

crosstabs r by c / stat = chisq /cells = count row.

 

 

I’d like to compare the solutions with the adjusted residual (ie. Non significant chi, but high value of adj. res.), but I’m not capable to change the second lines in the syntax above.

 

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes the syntax?






--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/
Reply | Threaded
Open this post in threaded view
|

Re: Chi-Square, adjusted residual and multiple comparison

Bruce Weaver
Administrator
In reply to this post by Tom
Balmer Thomas wrote
--- snip ---

select if any(r,1,3). /* Omit row 2 .

With 5 categories, I should omit not just one, but 3 rows. But how goes
the syntax?

--- snip ---
That SELECT IF *keeps* rows 1 and 3.  So it omits row 2, because there are only 3 rows in that example.  To omit the 3 rows you wish to omit, KEEP the two that you want.  (Don't forget the TEMPORARY on the line before SELECT IF--without it, you will permanently delete the unselected rows.)

HTH.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Chi-Square, adjusted residual and multiple comparison

Josep Grau Valldosera
In reply to this post by Marta Garcia-Granero
Dear Marta,

I've used both the Bonferroni and Marascuilo processes that you suggested some time ago with some data I'm analyzing concerning my research about students dropout at the Universitat Oberta de Catalunya.

These are the data:

Programme     Dropout     Count
Business Sci.     dropout     9132
Business Sci.     non dropout     7686
Tech. Eng. in CM     dropout     3629
Tech. Eng. in CM     non dropout     1803
Tech. Eng in CS     dropout     4917
Tech. Eng in CS     non dropout     2579
Tourism     dropout     939
Tourism     non dropout     950
Catalan Language     dropout     703
Catalan Language     non dropout     491
Law     dropout     3320
Law     non dropout     2829
Humanities     dropout     3470
Humanities     non dropout     1926
Psychology     dropout     4336
Psychology     non dropout     3338

"Concentrating" into the Marascuilo results, I've some doubts in relation to their interpretation: Specifically,
- I would like to know which is the equivalence between the programmes and the "numbers"... P(1) would correspond to the probability of dropout for the first column (programme) in the chi-square table (Business Sci) - as can be found in the "bonferroni-marascuilo.spv" file attached-?
- analyzing the results, these are the number of times each programme has a "significative" difference:
1 = 4 times
2 = 3 times
3 = 4 times
4 = 3 times
5 = 3 times
6 = 5 times
7 = 5 times
8 = 6 times
....should I interpret this as the programmes 8,7 and 6 have a more significative probability of dropping out than the rest?

I attach some files that I think can be useful:
xi-quadrat_per_programa-1er_cicle.sav
Marascuilo.sps
bonferroni-marascuilo.spv


Thank you very much in advance


Josep Grau
Staff at the Market Analysis Department
Student at the Doctoral Programme in Education and ICT (e-learning)