chi square but very disparate sample sizes

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

chi square but very disparate sample sizes

sgthomson99
Hi everyone,

I need advice.  I'm working with a big dataset.

If Population A is 4000 patients and there are 3.2% with the flu, and Population B is 53500 patients and 3.9% have the flu, is the difference in prevalence of the flu significant or not?

The clinic managers are saying use chi square 2x2 table, and then the prevalences are hugely significantly different.  Just in my opinion because of the big sample size difference.  

I'm being conservative and saying with such hugely different sample sizes, it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions for the comparison -- and it's not significant.

Any suggestions greatly appreciated.

Susan
Reply | Threaded
Open this post in threaded view
|

Re: chi square but very disparate sample sizes

Art Kendall
Please describe your situation in more detail.

Are these samples?
Are these the actual populations you want to discuss?
Do you have case  by case data on every member of the populations?

What are the   definitions of the populations of interest?

How did you gather your data?
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
Thank you, Art.
 
These numbers are actual patient data from a huge health care database.  One is a small zone, and one is a big one.  I do have case by case data on every patient in each zone.
 
I was concerned about just the sample sizes being so big for the one and so much smaller (albeit still relatively large) for the other population -- and that being the likely reason behind any statistically significant result I might find.
 
I'd thought I remembered seeing a paper awhile ago that tried to deal with a similar situation by taking the prevalences and just assuming n=100 and comparing the prevalences using a z test for proportions.
 
Since posting my question the other day, I have kept on reading and am thinking I should go with a z test for proportions and use the actual data, no assuming n=100 -- but to put a caveat on the results saying that they are significant likely due to the large but very disparate sample sizes, but that the clinical significance is the real issue.  Is a difference of .7% meaningful clinically?  Probably not.

Susan 

 

Date: Fri, 24 Oct 2014 07:41:11 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: chi square but very disparate sample sizes

Please describe your situation in more detail.

Are these samples?
Are these the actual populations you want to discuss?
Do you have case  by case data on every member of the populations?

What are the   definitions of the populations of interest?

How did you gather your data?
Art Kendall
Social Research Consultants



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727687.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: chi square but very disparate sample sizes

Rich Ulrich
In reply to this post by sgthomson99
Medicine regularly uses the Odds Ratio for describing the difference
between two outcomes, especially when the rates are low and the Ns
are large.  Your Odds Ratio of 1.2 might be interesting if it were an
expected or well-justified result in a controlled experiment.  But even
a ratio of 1.5 would be shaky for observational data, which is what your
data appear to be.  A value of 1.2 is about the weakest anyone would
consider as possibly-meaningful for a randomized study.

So.  Conclude that your outcome is apt to be the result of uncontrolled
factors that differ between the populations.  It can be worth mentioning
as a curiosity, but it is not sound to conclude that this is a meaningful
difference between the groups.

I think the magnitude is also described in some studies (mainly, European)
by using the "minimum effective N" that would result in a 5% test

--
Rich Ulrich


> Date: Wed, 22 Oct 2014 11:28:23 -0700

> From: [hidden email]
> Subject: chi square but very disparate sample sizes
> To: [hidden email]
>
> Hi everyone,
>
> I need advice. I'm working with a big dataset.
>
> If Population A is 4000 patients and there are 3.2% with the flu, and
> Population B is 53500 patients and 3.9% have the flu, is the difference in
> prevalence of the flu significant or not?
>
> The clinic managers are saying use chi square 2x2 table, and then the
> prevalences are hugely significantly different. Just in my opinion because
> of the big sample size difference.
>
> I'm being conservative and saying with such hugely different sample sizes,
> it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions
> for the comparison -- and it's not significant.
>
> Any suggestions greatly appreciated.
>
> Susan
>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: chi square but very disparate sample sizes

David Greenberg
In reply to this post by sgthomson99
Your instincts are wrong. You can't change the data. Incidentally, you
can also use a z test for the difference of proportions. It should
give you results that are similar to the chi-square test David
Greenberg, Sociology Department, New York University

On Wed, Oct 22, 2014 at 2:28 PM, sgthomson99 <[hidden email]> wrote:

> Hi everyone,
>
> I need advice.  I'm working with a big dataset.
>
> If Population A is 4000 patients and there are 3.2% with the flu, and
> Population B is 53500 patients and 3.9% have the flu, is the difference in
> prevalence of the flu significant or not?
>
> The clinic managers are saying use chi square 2x2 table, and then the
> prevalences are hugely significantly different.  Just in my opinion because
> of the big sample size difference.
>
> I'm being conservative and saying with such hugely different sample sizes,
> it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions
> for the comparison -- and it's not significant.
>
> Any suggestions greatly appreciated.
>
> Susan
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
Many thanks for your help, David.
 
Susan
 

Date: Fri, 24 Oct 2014 12:34:22 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: chi square but very disparate sample sizes

Your instincts are wrong. You can't change the data. Incidentally, you
can also use a z test for the difference of proportions. It should
give you results that are similar to the chi-square test David
Greenberg, Sociology Department, New York University

On Wed, Oct 22, 2014 at 2:28 PM, sgthomson99 <[hidden email]> wrote:

> Hi everyone,
>
> I need advice.  I'm working with a big dataset.
>
> If Population A is 4000 patients and there are 3.2% with the flu, and
> Population B is 53500 patients and 3.9% have the flu, is the difference in
> prevalence of the flu significant or not?
>
> The clinic managers are saying use chi square 2x2 table, and then the
> prevalences are hugely significantly different.  Just in my opinion because
> of the big sample size difference.
>
> I'm being conservative and saying with such hugely different sample sizes,
> it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions
> for the comparison -- and it's not significant.
>
> Any suggestions greatly appreciated.
>
> Susan
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727690.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

Art Kendall
In reply to this post by sgthomson99
The probability that the apparent difference is due to the vagaries of sampling is zero. IFF you do not want to act as if this were (note subjunctive-contrary-to-fact)  a sample across time the size of the diffeeeeeeeence is still tiny     for most purposes.

What difference wrt policy/practice/theory/understanding does this difference make.  Are the groups dissimilar in any meaningful     ways such       as climate,      occupations   subject to more frequent contact with strangers, etc.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
Thanks for the reply, Art. 
The groups don't come out significantly different in any meaningful ways that we are able to examine with the data we have available.  The discussion among the managers is centering exactly on your question -- what difference to policy and practice does this small difference make, is it clinically meaningful and so on.  The effect is very small.
 
Many thanks for your help.

Susan 

 

Date: Fri, 24 Oct 2014 13:51:31 -0700
From: [hidden email]
To: [hidden email]
Subject: RE: chi square but very disparate sample sizes

The probability that the apparent difference is due to the vagaries of sampling is zero. IFF you do not want to act as if this were (note subjunctive-contrary-to-fact)  a sample across time the size of the diffeeeeeeeence is still tiny     for most purposes.

What difference wrt policy/practice/theory/understanding does this difference make.  Are the groups dissimilar in any meaningful     ways such       as climate,      occupations   subject to more frequent contact with strangers, etc.
Art Kendall
Social Research Consultants



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727692.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: chi square but very disparate sample sizes

Richard Ristow
In reply to this post by sgthomson99
At 02:28 PM 10/22/2014, sgthomson99 wrote:

>If Population A is 4000 patients and there are 3.2% with the flu,
>and Population B is 53500 patients and 3.9% have the flu, is the
>difference in prevalence of the flu significant or not?

One more consideration: the chi-square is based on the assumption
that different individuals' contracting the flu are statistically
independent events.

I don't know details of influenza epidemiology, but it is a readily
transmissible infection, and a difference in rates could be due to
random events that affect a number of people together: for example, a
cluster around one case who'd been out and around and infected a good
many other people.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: chi square but very disparate sample sizes

Mike
Just to develop the point made below a little bit:  I noticed
that the larger sample had the larger percentage of flu cases.
If there were more "populations" or groups, one might want
to see if there is a correlation between flu rate and sample
size.  If such a result was obtained, it could mean that larger
groups, for whatever reason, would have higher flu rates
(perhaps because of increased opportunities for infection, etc.).
If there is additional data, this might be something to look at.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
On Friday, October 24, 2014 6:43 PM, Richard Ristow wrote:

> At 02:28 PM 10/22/2014, sgthomson99 wrote:
>
>>If Population A is 4000 patients and there are 3.2% with the flu, and
>>Population B is 53500 patients and 3.9% have the flu, is the
>>difference in prevalence of the flu significant or not?
>
> One more consideration: the chi-square is based on the assumption that
> different individuals' contracting the flu are statistically
> independent events.
>
> I don't know details of influenza epidemiology, but it is a readily
> transmissible infection, and a difference in rates could be due to
> random events that affect a number of people together: for example, a
> cluster around one case who'd been out and around and infected a good
> many other people.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: chi square but very disparate sample sizes

Bruce Weaver
Administrator
In reply to this post by sgthomson99
Rather than looking at a p-value, which will be small when the samples are large, you could look at the confidence interval on the risk difference (or risk ratio).  Here is a good calculator for the CI on the risk difference (it uses a method recommended by Robert Newcombe, author of many articles on CIs for proportions and related measures, and more recently of a book).

http://vassarstats.net/prop2_ind.html

Risk Difference = 0.007 <-- Is this big enough to be clinically significant?

95% confidence interval: no continuity correction
Lower limit = 0.0009 Upper limit = 0.0123

95% confidence interval: including continuity correction
Lower limit = 0.0007 Upper limit = 0.0124


Finally, the questions about independence of observations raised by Richard & Mike apply here too.

HTH.

sgthomson99 wrote
Hi everyone,

I need advice.  I'm working with a big dataset.

If Population A is 4000 patients and there are 3.2% with the flu, and Population B is 53500 patients and 3.9% have the flu, is the difference in prevalence of the flu significant or not?

The clinic managers are saying use chi square 2x2 table, and then the prevalences are hugely significantly different.  Just in my opinion because of the big sample size difference.  

I'm being conservative and saying with such hugely different sample sizes, it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions for the comparison -- and it's not significant.

Any suggestions greatly appreciated.

Susan
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
Many thanks for the suggestions -- you are a fabulous teacher, Bruce.

Susan 



Date: Fri, 24 Oct 2014 18:20:03 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: chi square but very disparate sample sizes

Rather than looking at a p-value, which will be small when the samples are large, you could look at the confidence interval on the risk difference (or risk ratio).  Here is a good calculator for the CI on the risk difference (it uses a method recommended by Robert Newcombe, author of many articles on CIs for proportions and related measures, and more recently of a book).

http://vassarstats.net/prop2_ind.html

Risk Difference = 0.007 <-- Is this big enough to be clinically significant?

95% confidence interval: no continuity correction
Lower limit = 0.0009 Upper limit = 0.0123

95% confidence interval: including continuity correction
Lower limit = 0.0007 Upper limit = 0.0124


Finally, the questions about independence of observations raised by Richard & Mike apply here too.

HTH.

sgthomson99 wrote
Hi everyone,

I need advice.  I'm working with a big dataset.

If Population A is 4000 patients and there are 3.2% with the flu, and Population B is 53500 patients and 3.9% have the flu, is the difference in prevalence of the flu significant or not?

The clinic managers are saying use chi square 2x2 table, and then the prevalences are hugely significantly different.  Just in my opinion because of the big sample size difference.  

I'm being conservative and saying with such hugely different sample sizes, it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions for the comparison -- and it's not significant.

Any suggestions greatly appreciated.

Susan
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727696.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
In reply to this post by Mike
Thanks for the suggestion, Mike.  We do have data for other zones -- I am looking at flu rate and sample size.  Thanks again -- I really appreciate it!

Susan 



Date: Fri, 24 Oct 2014 17:08:21 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: chi square but very disparate sample sizes

Just to develop the point made below a little bit:  I noticed
that the larger sample had the larger percentage of flu cases.
If there were more "populations" or groups, one might want
to see if there is a correlation between flu rate and sample
size.  If such a result was obtained, it could mean that larger
groups, for whatever reason, would have higher flu rates
(perhaps because of increased opportunities for infection, etc.).
If there is additional data, this might be something to look at.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
On Friday, October 24, 2014 6:43 PM, Richard Ristow wrote:

> At 02:28 PM 10/22/2014, sgthomson99 wrote:
>
>>If Population A is 4000 patients and there are 3.2% with the flu, and
>>Population B is 53500 patients and 3.9% have the flu, is the
>>difference in prevalence of the flu significant or not?
>
> One more consideration: the chi-square is based on the assumption that
> different individuals' contracting the flu are statistically
> independent events.
>
> I don't know details of influenza epidemiology, but it is a readily
> transmissible infection, and a difference in rates could be due to
> random events that affect a number of people together: for example, a
> cluster around one case who'd been out and around and infected a good
> many other people.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727695.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
In reply to this post by Richard Ristow
Many thanks for the points to consider, Richard.  Lots of things to discuss with the managers tomorrow morning at work.  I really appreciate your suggestions and insight.

Susan



Date: Fri, 24 Oct 2014 16:02:18 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: chi square but very disparate sample sizes

At 02:28 PM 10/22/2014, sgthomson99 wrote:

>If Population A is 4000 patients and there are 3.2% with the flu,
>and Population B is 53500 patients and 3.9% have the flu, is the
>difference in prevalence of the flu significant or not?

One more consideration: the chi-square is based on the assumption
that different individuals' contracting the flu are statistically
independent events.

I don't know details of influenza epidemiology, but it is a readily
transmissible infection, and a difference in rates could be due to
random events that affect a number of people together: for example, a
cluster around one case who'd been out and around and infected a good
many other people.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727694.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

RE: chi square but very disparate sample sizes

sgthomson99
In reply to this post by Rich Ulrich
Many thanks, Rich.  Your ideas are very helpful as always.  I hadn't heard of minimum effective N before -- am now reading up on that.  Many thanks!

Susan


Date: Fri, 24 Oct 2014 18:52:17 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: chi square but very disparate sample sizes

Medicine regularly uses the Odds Ratio for describing the difference
between two outcomes, especially when the rates are low and the Ns
are large.  Your Odds Ratio of 1.2 might be interesting if it were an
expected or well-justified result in a controlled experiment.  But even
a ratio of 1.5 would be shaky for observational data, which is what your
data appear to be.  A value of 1.2 is about the weakest anyone would
consider as possibly-meaningful for a randomized study.

So.  Conclude that your outcome is apt to be the result of uncontrolled
factors that differ between the populations.  It can be worth mentioning
as a curiosity, but it is not sound to conclude that this is a meaningful
difference between the groups.

I think the magnitude is also described in some studies (mainly, European)
by using the "minimum effective N" that would result in a 5% test

--
Rich Ulrich


> Date: Wed, 22 Oct 2014 11:28:23 -0700

> From: [hidden email]
> Subject: chi square but very disparate sample sizes
> To: [hidden email]
>
> Hi everyone,
>
> I need advice. I'm working with a big dataset.
>
> If Population A is 4000 patients and there are 3.2% with the flu, and
> Population B is 53500 patients and 3.9% have the flu, is the difference in
> prevalence of the flu significant or not?
>
> The clinic managers are saying use chi square 2x2 table, and then the
> prevalences are hugely significantly different. Just in my opinion because
> of the big sample size difference.
>
> I'm being conservative and saying with such hugely different sample sizes,
> it's better to use 3.2/100 versus 3.9/100 so like a z test for proportions
> for the comparison -- and it's not significant.
>
> Any suggestions greatly appreciated.
>
> Susan
>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/chi-square-but-very-disparate-sample-sizes-tp5727676p5727697.html
To unsubscribe from chi square but very disparate sample sizes, click here.
NAML