Login  Register

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Posted by Mike on Oct 24, 2020; 9:02pm
URL: http://spssx-discussion.165.s1.nabble.com/Correlational-Example-Involving-COVID-19-Useful-for-Classes-tp5739871p5739885.html

Thanks for bringing up the correlation between chocolate consumption (CC) and the number
of Nobel Laureates (#NL); I remember when it first came out.  However, although the correlation
is between group/aggregate values, I think that this is a better example of spurious correlation
than an ecological correlation.  It can be argued that the correlation between CC and #NL is
dependent on a third variable Z which might be national wealth/GDP, number of graduate
degree granting institutions, and/or other variables (or systems of variables) that are causally
related to #NL.

My understanding of ecological correlation/inference (also known as the ecological fallacy) is
that statistics and relationships based on aggregate/grouped data do not necessarily reflect
the statistics or relationships based on individual level data (or whatever the lowest unit of analysis
is; in the social sciences, this would usually be the person level).  The Wikipedia entry on the
Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy ) is consistent with
this view but I think Simpson's Paradox presents the fallacy most directly (see:
<a href="https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson&#39;s_paradox">https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ).

However, to make my comments relevant to the WaPo/Carnegie-Mellon correlation
between percentage of mask wearers and percentage who knew a person with COVID-19
symptoms, I think a 1999 paper by David Freedman provides a good review of the
state of the art in ecological analysis back then (though, in part, it incorporates some
previous analysis and writing that is critical of Gary King's 1998 model for ecological analysis;
King has updated his model but I'll put that aside for now).  The reference for the
Freedman article is:
Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International
Encyclopedia of the Social & Behavioral Sciences, 6(4027-4030), 1-7.
And a Pdf can be accessed at:
http://michaeljohnsonphilosophy.com/wp-content/uploads/2012/10/ecological-fallacy.pdf

Freedman reviews some of the procedures that have been developed over the decades
to perform ecological analysis and how to determine whether such analysis is valid.
Remember that the ecological fallacy does not always occur, that is, the results seen
with aggregate data may well be consistent with analysis based on individual cases/units.
A positive correlation between two variables when one is using state-level data may
very well turn out to positive and of the similar magnitude when calculated on individual
persons/cases.  Thinking in terms of levels and that the ecological fallacy arises from
some inconsistency or problem across levels helps to better understand when and
how the ecological fallacy occurs.

This is one reason why current approaches to ecological analysis make use of
multilevel modeling.  One review of such work in the pharmaceutical area is the
following:
Greenland, Sander (2018)  Ecologic Inference.  in Chow, S. C. (Ed.). (2018).
Encyclopedia of Biopharmaceutical Statistics-Four Volume Set. CRC Press.

This article is available on books.google.com though the first page is not
available for preview; see:
https://books.google.com/books?id=ERJqDwAAQBAJ&newbks=1&newbks_redir=0&printsec=frontcover&pg=PT1615&dq=multilevel+modeling+ecological+inference&hl=en#v=onepage&q=multilevel%20modeling%20ecological%20inference&f=false

There are many ways that ecological analysis can go wrong but knowing what
the specific problems are (e.g., cross-level interactions that are not modeled,
covariates in different measurement formats, etc.) can help a researcher
achieve more valid conclusions from the data that is available.  So, though
the WaPo/Carnegie-Mellon is an ecological correlation it doesn't necessarily
follow that one won't see a negative correlation between percent of 100% mask
use and knowing people with COVID-19 symptoms (and deaths) when
individual person data is used.  It is, after all, an empirical question.

-Mike Palij
New York University
[hidden email]


On Fri, Oct 23, 2020 at 3:50 PM Bruce Weaver <[hidden email]> wrote:
Thanks Mike.  It's a nice example of an ecological correlation.  For some
reason, it reminded me of this other well-known example:

https://urldefense.proofpoint.com/v2/url?u=https-3A__i.insider.com_5353e29b6da8115322dd4816-3Fwidth-3D1000-26format-3Djpeg-26auto-3Dwebp&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=uG6KusBzr-OA1Wny0rmLrT0-S5DKqTszUBXF53aTb3g&e=

Unfortunately, some readers didn't understand that the NEJM article
describing the link between chocolate consumption and Nobel laureates was
meant to be a joke.  :-) 

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.wbur.org_commonhealth_2012_10_15_nobel-2Dchocolate-2Djoke&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=f_GrBIkg8VFP7E6E2GmIOhiHLa2zHTzj3qCsGlRWU4U&e=

I'm not suggesting the correlation in the Washington Post article is a joke,
by the way.  But I am suggesting that we must always be mindful of the
ecological and atomistic fallacies when examining associations at different
levels. 

Cheers,
Bruce



Mike wrote
> Note:  r = -0.85; the R^2 is provided in the article and I used a
> calculator to
> get the square root and forgot to include the negative sign (as the
> percentage
> of mask users in a state increases, the fewer one knows people with
> COVID-19
> symptoms).  Sorry about that.
>
> -Mike Palij
> New York University

> mp26@

>





-----
--
Bruce Weaver
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=Rn7uTOXmL5juye1Y3UL46jUfoHH5c-FTl7SyfBcp7UA&e=

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=KPIUVs9IFyYI9VXKnE3mgyc_FQ0VJUqqtOH9yfDX9do&e=

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD