Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

DP_Sydney
Hi all,

 I have completed a 5x2 contingency table in SPSS which returned a significant chi-square value (P<0.001). My columns are species detected/not detected in surveys and the rows are different habitats (see data below). I expected the proportion of detected to not-detected to differ between habitats. So all good here.

 I followed this up with z-tests (under the 'Custom Table' option) which detected significant differences in column proportions for three of the habitats (=rows) (B,C,D). B and C had lower proportions of surveys detecting the species and D had greater than expected.

 However, the 'reporting rate' of the species (i.e. the number of surveys that detected the species as a percentage of total surveys, which equals the row percentage in the contingency table) was highest in habitat D (20.4% - no surprises there), AND in E (12.8%) which showed no significant difference in proportions. All other row percentages were below 4%.

 Furthermore, in a 2x2 table just comparing habitat D and E (which from a priori reasons were the only habitats I expected to have a greater proportion of detected surveys - which they did as row percentages but not in z-tests) there was no difference between them (exact p = 0.079).

 I'm confused about how to interpret these results. What exactly does a significant z-test tell me about the proportions of detected/not-detected? Does a significantly greater than expected proportion of detected in habitat X not equate to a significantly lower than expected proportion of not-detected in habitat X? Can I still take note of the fact that in the habitat of row 5 the 'reporting rate' was still high relative to other habitats except D?

The data are as follows (standardised residuals in parentheses)
                        Detected            Not Detected           % detected (row %)
A   Obs              1 (-1.6)             45 (0.5)                   2.2
     Exp              4.4                   41.6
B   Obs              4 (-2.3)             123 (0.8)                  3.1
     Exp              12.2                  114.8
C   Obs              3 (-3.4)             175                         1.7
     Exp              17.1                  160.9
D   Obs              40 (4.9)             156 (-1.6)                20.4
     Exp              18.8                  177.2
E   Obs              18 (1.2)             123 (-0.4)                12.8
     Exp              13.5                  127.5

Hope anyone can help! :-)

Thanks in advance,

Dean
Reply | Threaded
Open this post in threaded view
|

Re: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

Rich Ulrich

Yes, the z's for the cells are a remark on the relation to the 'average' difference.

So, acacia/euc  is not extreme because it falls in the middle.
Further, z for the low proportion for wetland is not as extreme because
the N is smaller and thus the power is weaker.

I'm going to call the groups A B C D and E, for the ordered means
(1.7, 2.2, 3.1, 12.8, 20.4).  The first thing to conclude is that
(A B C) are the same and that all the cases are in (D E).

Using the z's as tests, (A B C) (D E)  is the way that you could show
differences, in one style of post-hoc reporting.  If that was the full
result.  I don't  remember if you stated it, but it is possible that D 
is not "different" from (A B C).  In that case, the post-hoc report
could be (A B C D) (D E).  

This would say that D is "not different" from the first three, and it also is
not different from the last.  Traditionally, these have often been shown
by underlining the means that "do not differ."   - This style of report was
designed for ANOVA, using tests based on the pooled variance, for groups
with equal Ns;  it does not work consistently for 2x2  tables, for grossly
unequal Ns, or for other paired comparisons that are may have different
error terms for various comparisons. (re: smallest sample here.  But it
seems like it might work here.

MOST of the detections are in D and E.  The simple difference between
them is not (I think you report) tested as different in a 2x2 table.  Okay.
So, at the conventional test level, they do not differ.  However, the
"effect size" of their measured difference is still somewhat large.  It also
is "sensible"  in that the mixed environment is less extreme than the pure
one.  So you might expect a difference to be confirmed if the sampling was
more extensive.

--
Rich Ulrich



From: [hidden email]
To: [hidden email]; [hidden email]
Subject: RE: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!
Date: Fri, 27 May 2011 18:43:24 +1000

Thanks very much for your reply Rich,

 I'm still a little confused as to how to interpret the z-tests - do they indicate which rows (habitats in my case) have the greatest difference between the numbers of the categories of the columns (no. of surveys that detected/didn't detect in my case)? If so, is this relative to the 'average' difference between column categories across all rows?

[snip, some]
... But I had interpreted the z-tests  as indicating that Acacia was contributing to the significance of the chi-square but not Acacia/Euc - despite 12.8% being >4 times the three other habitats (1.7-3.1%), and in a 2x2 table Acacia vs Acacia/Euc was not significant. 

 In summary, the species is detected significantly more frequently in two habitats (Acacia + Acacia/Euc) and these are frequented either equally OR Acacia more so than Acacia/Euc. It is the latter part that is troubling me (hence the follow up 2x2 table).

[snip, rest]
Reply | Threaded
Open this post in threaded view
|

Re: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

Bruce Weaver
Administrator
In reply to this post by DP_Sydney
I've never used the z-tests you refer to, so gave it a try as follows:

data list list / habitat detected kount (3f5.0).
begin data
1 1   1
1 2  45
2 1   4
2 2 123
3 1   3
3 2 175
4 1  40
4 2 156
5 1  18
5 2 123
end data.

var lab
 habitat "Habitat"
 detected "Detected"
.
val lab
 habitat 1 "A" 2 "B" 3 "C" 4 "D" 5 "E" /
 detected 1 "Yes" 2 "No"
.

weight by kount.

* Custom Tables.
CTABLES
  /VLABELS VARIABLES=habitat detected DISPLAY=DEFAULT
  /TABLE habitat [COUNT F40.0] BY detected
  /CATEGORIES VARIABLES=habitat detected ORDER=A KEY=VALUE EMPTY=INCLUDE
  /SIGTEST TYPE=CHISQUARE ALPHA=0.05 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES
    CATEGORIES=ALLVISIBLE MERGE=NO.


Does this generate the same z-test results you have?  For those who cannot run the syntax, the z-test output looks like this:

Comparisons of Column Proportions^a
        Yes No
A
B A
C A
D B
E

Results are based on two-sided tests with significance level 0.05. For each significant pair, the key of the category with the smaller column proportion appears under the category with the larger column proportion.

a. Tests are adjusted for all pairwise comparisons within a row of each innermost subtable using the Bonferroni correction.

It may be a Friday afternoon thing, but it is not immediately clear to me how to read those results.  

Here are the column percentages, by the way:

        Yes No
A 1.52% 7.23%
B 6.06% 19.77%
C 4.55% 28.14%
D 60.61% 25.08%
E 27.27% 19.77%


If you do have specific a priori contrasts in mind, you might be better off partitioning the overall table in a way that addresses those questions.  I have some examples of that in a chapter of notes on chi-square analysis -- item 3 here:  https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/notes.  Notice that for this approach, the likelihood ratio chi-square works out better than Pearson's statistic, because orthogonal components that should  add up to a whole DO add up to the whole.

HTH.


DP_Sydney wrote
Hi all,

 I have completed a 5x2 contingency table in SPSS which returned a significant chi-square value (P<0.001). My columns are species detected/not detected in surveys and the rows are different habitats (see data below). I expected the proportion of detected to not-detected to differ between habitats. So all good here.

 I followed this up with z-tests (under the 'Custom Table' option) which detected significant differences in column proportions for three of the habitats (=rows) (B,C,D). B and C had lower proportions of surveys detecting the species and D had greater than expected.

 However, the 'reporting rate' of the species (i.e. the number of surveys that detected the species as a percentage of total surveys, which equals the row percentage in the contingency table) was highest in habitat D (20.4% - no surprises there), AND in E (12.8%) which showed no significant difference in proportions. All other row percentages were below 4%.

 Furthermore, in a 2x2 table just comparing habitat D and E (which from a priori reasons were the only habitats I expected to have a greater proportion of detected surveys - which they did as row percentages but not in z-tests) there was no difference between them (exact p = 0.079).

 I'm confused about how to interpret these results. What exactly does a significant z-test tell me about the proportions of detected/not-detected? Does a significantly greater than expected proportion of detected in habitat X not equate to a significantly lower than expected proportion of not-detected in habitat X? Can I still take note of the fact that in the habitat of row 5 the 'reporting rate' was still high relative to other habitats except D?

The data are as follows (standardised residuals in parentheses)
                        Detected            Not Detected           % detected (row %)
A   Obs              1 (-1.6)             45 (0.5)                   2.2
     Exp              4.4                   41.6
B   Obs              4 (-2.3)             123 (0.8)                  3.1
     Exp              12.2                  114.8
C   Obs              3 (-3.4)             175                         1.7
     Exp              17.1                  160.9
D   Obs              40 (4.9)             156 (-1.6)                20.4
     Exp              18.8                  177.2
E   Obs              18 (1.2)             123 (-0.4)                12.8
     Exp              13.5                  127.5

Hope anyone can help! :-)

Thanks in advance,

Dean
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

RE: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

DP_Sydney
In reply to this post by Rich Ulrich
Thanks Rich!

I think I'm beginning to understand these z-tests.

Let me test whether this is the case and the following is a correct interpretation: from the results it can be concluded that detection in Euc (your A) and Grass/Forb (your C) is significantly lower than in Acacia/Euc (your D) and Acacia (your E). Furthermore, the overall test suggests that detection in Acacia (your E) is significantly greater than Acacia/Euc (your D), but this difference does not bear out when partitioning the contingency table into a 2x2 table with these two categories. Last, the small sample size for Wetland (your B) makes it difficult to determine if the detection in this habitat differs from the others (but nevertheless the detection percentage is as low as your 'A' and 'C').

Have I got it right? 

I wonder if you could clarify a couple of points from your email. 

"I'm going to call the groups A B C D and E, for the ordered means
(1.7, 2.2, 3.1, 12.8, 20.4).  The first thing to conclude is that 
(A B C) are the same and that all the cases are in (D E)".

What did you mean by "...all the cases are in (D E)"? Does this refer to the majority of detection cases are in D and E?

"Using the z's as tests, (A B C) (D E)  is the way that you could show 
differences, in one style of post-hoc reporting..."

How is it that it D is not different to E, and both are different to A/B/C, (as opposed to A/B/C/D vs E) - is it because of the greater similarity between D and E relative to D and A/B/C?

Thank you very much for all your help with this - it is making the haze seem clearer!

Cheers,
Dean


Date: Fri, 27 May 2011 11:13:40 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!


Yes, the z's for the cells are a remark on the relation to the 'average' difference.

So, acacia/euc  is not extreme because it falls in the middle.
Further, z for the low proportion for wetland is not as extreme because
the N is smaller and thus the power is weaker.

I'm going to call the groups A B C D and E, for the ordered means
(1.7, 2.2, 3.1, 12.8, 20.4).  The first thing to conclude is that
(A B C) are the same and that all the cases are in (D E).

Using the z's as tests, (A B C) (D E)  is the way that you could show
differences, in one style of post-hoc reporting.  If that was the full
result.  I don't  remember if you stated it, but it is possible that D 
is not "different" from (A B C).  In that case, the post-hoc report
could be (A B C D) (D E).  

This would say that D is "not different" from the first three, and it also is
not different from the last.  Traditionally, these have often been shown
by underlining the means that "do not differ."   - This style of report was
designed for ANOVA, using tests based on the pooled variance, for groups
with equal Ns;  it does not work consistently for 2x2  tables, for grossly
unequal Ns, or for other paired comparisons that are may have different
error terms for various comparisons. (re: smallest sample here.  But it
seems like it might work here.

MOST of the detections are in D and E.  The simple difference between
them is not (I think you report) tested as different in a 2x2 table.  Okay.
So, at the conventional test level, they do not differ.  However, the
"effect size" of their measured difference is still somewhat large.  It also
is "sensible"  in that the mixed environment is less extreme than the pure
one.  So you might expect a difference to be confirmed if the sampling was
more extensive.

--
Rich Ulrich



From: [hidden email]
To: [hidden email]; [hidden email]
Subject: RE: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!
Date: Fri, 27 May 2011 18:43:24 +1000

Thanks very much for your reply Rich,

 I'm still a little confused as to how to interpret the z-tests - do they indicate which rows (habitats in my case) have the greatest difference between the numbers of the categories of the columns (no. of surveys that detected/didn't detect in my case)? If so, is this relative to the 'average' difference between column categories across all rows?

[snip, some]
... But I had interpreted the z-tests  as indicating that Acacia was contributing to the significance of the chi-square but not Acacia/Euc - despite 12.8% being >4 times the three other habitats (1.7-3.1%), and in a 2x2 table Acacia vs Acacia/Euc was not significant. 

 In summary, the species is detected significantly more frequently in two habitats (Acacia + Acacia/Euc) and these are frequented either equally OR Acacia more so than Acacia/Euc. It is the latter part that is troubling me (hence the follow up 2x2 table).

[snip, rest]



To unsubscribe from Interpreting Contingency table analysis & z-tests results - PLEASE HELP!, click here.
Reply | Threaded
Open this post in threaded view
|

RE: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

DP_Sydney
In reply to this post by Bruce Weaver
Thanks Bruce.

I did get the same result (see earlier post) with the Custom Table. I have printed out your notes and will read them over the next couple of days and then digest the material!

Hopefully I'm getting there.


Date: Fri, 27 May 2011 13:37:10 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Interpreting Contingency table analysis & z-tests results - PLEASE HELP!

I've never used the z-tests you refer to, so gave it a try as follows:

data list list / habitat detected kount (3f5.0).
begin data
1 1   1
1 2  45
2 1   4
2 2 123
3 1   3
3 2 175
4 1  40
4 2 156
5 1  18
5 2 123
end data.

var lab
 habitat "Habitat"
 detected "Detected"
.
val lab
 habitat 1 "A" 2 "B" 3 "C" 4 "D" 5 "E" /
 detected 1 "Yes" 2 "No"
.

weight by kount.

* Custom Tables.
CTABLES
  /VLABELS VARIABLES=habitat detected DISPLAY=DEFAULT
  /TABLE habitat [COUNT F40.0] BY detected
  /CATEGORIES VARIABLES=habitat detected ORDER=A KEY=VALUE EMPTY=INCLUDE
  /SIGTEST TYPE=CHISQUARE ALPHA=0.05 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES
    CATEGORIES=ALLVISIBLE MERGE=NO.


Does this generate the same z-test results you have?  For those who cannot run the syntax, the z-test output looks like this:

Comparisons of Column Proportions^a
        Yes No
A
B A
C A
D B
E

Results are based on two-sided tests with significance level 0.05. For each significant pair, the key of the category with the smaller column proportion appears under the category with the larger column proportion.

a. Tests are adjusted for all pairwise comparisons within a row of each innermost subtable using the Bonferroni correction.

It may be a Friday afternoon thing, but it is not immediately clear to me how to read those results.  

Here are the column percentages, by the way:

        Yes No
A 1.52% 7.23%
B 6.06% 19.77%
C 4.55% 28.14%
D 60.61% 25.08%
E 27.27% 19.77%


If you do have specific a priori contrasts in mind, you might be better off partitioning the overall table in a way that addresses those questions.  I have some examples of that in a chapter of notes on chi-square analysis -- item 3 here:  https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/notes.  Notice that for this approach, the likelihood ratio chi-square works out better than Pearson's statistic, because orthogonal components that should  add up to a whole DO add up to the whole.

HTH.


DP_Sydney wrote:
Hi all,

 I have completed a 5x2 contingency table in SPSS which returned a significant chi-square value (P<0.001). My columns are species detected/not detected in surveys and the rows are different habitats (see data below). I expected the proportion of detected to not-detected to differ between habitats. So all good here.

 I followed this up with z-tests (under the 'Custom Table' option) which detected significant differences in column proportions for three of the habitats (=rows) (B,C,D). B and C had lower proportions of surveys detecting the species and D had greater than expected.

 However, the 'reporting rate' of the species (i.e. the number of surveys that detected the species as a percentage of total surveys, which equals the row percentage in the contingency table) was highest in habitat D (20.4% - no surprises there), AND in E (12.8%) which showed no significant difference in proportions. All other row percentages were below 4%.

 Furthermore, in a 2x2 table just comparing habitat D and E (which from a priori reasons were the only habitats I expected to have a greater proportion of detected surveys - which they did as row percentages but not in z-tests) there was no difference between them (exact p = 0.079).

 I'm confused about how to interpret these results. What exactly does a significant z-test tell me about the proportions of detected/not-detected? Does a significantly greater than expected proportion of detected in habitat X not equate to a significantly lower than expected proportion of not-detected in habitat X? Can I still take note of the fact that in the habitat of row 5 the 'reporting rate' was still high relative to other habitats except D?

The data are as follows (standardised residuals in parentheses)
                        Detected            Not Detected           % detected (row %)
A   Obs              1 (-1.6)             45 (0.5)                   2.2
     Exp              4.4                   41.6
B   Obs              4 (-2.3)             123 (0.8)                  3.1
     Exp              12.2                  114.8
C   Obs              3 (-3.4)             175                         1.7
     Exp              17.1                  160.9
D   Obs              40 (4.9)             156 (-1.6)                20.4
     Exp              18.8                  177.2
E   Obs              18 (1.2)             123 (-0.4)                12.8
     Exp              13.5                  127.5

Hope anyone can help! :-)

Thanks in advance,

Dean
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.



To unsubscribe from Interpreting Contingency table analysis & z-tests results - PLEASE HELP!, click here.