SPSSX Discussion

Correlational Example Involving COVID-19 Useful for Classes

Classic

List

Threaded

11 messages Options

Mike

Correlational Example Involving COVID-19 Useful for Classes

This Washington Post article provides an interesting example of correlation

between the "percentage of people one knows with COVID-19 symptoms"

and "percent of people wearing masks in public" -- the unit of analysis is U.S.

states. There is a nice scatterplot and the data used is provided in a table

(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).

I have not checked the Carnegie-Mellon source but they may be more

interesting data/analysis available there.

https://www.washingtonpost.com/business/2020/10/23/pandemic-data-chart-masks/?utm_campaign=wp_post_most&utm_medium=email&utm_source=newsletter&wpisrc=nl_most&carta-url=https%3A%2F%2Fs2.washingtonpost.com%2Fcar-ln-tr%2F2c4b72b%2F5f92fd6e9d2fda0efb521285%2F5ab230ae9bbc0f2b8372bb9f%2F8%2F71%2F6767f4e2dbc453527192dca03a2a33fc

-Mike Palij

New York University

[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Mike

Fwd: Correlational Example Involving COVID-19 Useful for Classes

Note: r = -0.85; the R^2 is provided in the article and I used a calculator to

get the square root and forgot to include the negative sign (as the percentage

of mask users in a state increases, the fewer one knows people with COVID-19

symptoms). Sorry about that.

-Mike Palij

New York University

[hidden email]

---------- Forwarded message ---------
From: Michael Palij <[hidden email]>
Date: Fri, Oct 23, 2020 at 1:17 PM
Subject: Correlational Example Involving COVID-19 Useful for Classes
To: SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>

This Washington Post article provides an interesting example of correlation

between the "percentage of people one knows with COVID-19 symptoms"

and "percent of people wearing masks in public" -- the unit of analysis is U.S.

states. There is a nice scatterplot and the data used is provided in a table

(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).

I have not checked the Carnegie-Mellon source but they may be more

interesting data/analysis available there.

-Mike Palij

New York University

[hidden email]

Bruce Weaver

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Administrator

Thanks Mike. It's a nice example of an ecological correlation. For some
reason, it reminded me of this other well-known example:

https://i.insider.com/5353e29b6da8115322dd4816?width=1000&format=jpeg&auto=webp

Unfortunately, some readers didn't understand that the NEJM article
describing the link between chocolate consumption and Nobel laureates was
meant to be a joke. :-)

https://www.wbur.org/commonhealth/2012/10/15/nobel-chocolate-joke

I'm not suggesting the correlation in the Washington Post article is a joke,
by the way. But I am suggesting that we must always be mindful of the
ecological and atomistic fallacies when examining associations at different
levels.

Cheers,
Bruce

Mike wrote

> Note: r = -0.85; the R^2 is provided in the article and I used a
> calculator to
> get the square root and forgot to include the negative sign (as the
> percentage
> of mask users in a state increases, the fewer one knows people with
> COVID-19
> symptoms). Sorry about that.
>
> -Mike Palij
> New York University

> mp26@

>

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

spss.giesel@yahoo.de

Re: Correlational Example Involving COVID-19 Useful for Classes

"... we must always be mindful of the ecological and atomistic fallacies when examining associations at different levels."

Well spoken, Bruce.

Cheers,
Mario

Am Freitag, 23. Oktober 2020, 21:50:02 MESZ hat Bruce Weaver <[hidden email]> Folgendes geschrieben:

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

In reply to this post by Bruce Weaver

Thanks for bringing up the correlation between chocolate consumption (CC) and the number

of Nobel Laureates (#NL); I remember when it first came out. However, although the correlation

is between group/aggregate values, I think that this is a better example of spurious correlation

than an ecological correlation. It can be argued that the correlation between CC and #NL is

dependent on a third variable Z which might be national wealth/GDP, number of graduate

degree granting institutions, and/or other variables (or systems of variables) that are causally

related to #NL.

My understanding of ecological correlation/inference (also known as the ecological fallacy) is

that statistics and relationships based on aggregate/grouped data do not necessarily reflect

the statistics or relationships based on individual level data (or whatever the lowest unit of analysis

is; in the social sciences, this would usually be the person level). The Wikipedia entry on the

Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy ) is consistent with

this view but I think Simpson's Paradox presents the fallacy most directly (see:

<a href="https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox">https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ).

However, to make my comments relevant to the WaPo/Carnegie-Mellon correlation

between percentage of mask wearers and percentage who knew a person with COVID-19

symptoms, I think a 1999 paper by David Freedman provides a good review of the

state of the art in ecological analysis back then (though, in part, it incorporates some

previous analysis and writing that is critical of Gary King's 1998 model for ecological analysis;

King has updated his model but I'll put that aside for now). The reference for the

Freedman article is:

Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International

Encyclopedia of the Social & Behavioral Sciences, 6(4027-4030), 1-7.

And a Pdf can be accessed at:

http://michaeljohnsonphilosophy.com/wp-content/uploads/2012/10/ecological-fallacy.pdf

Freedman reviews some of the procedures that have been developed over the decades

to perform ecological analysis and how to determine whether such analysis is valid.

Remember that the ecological fallacy does not always occur, that is, the results seen

with aggregate data may well be consistent with analysis based on individual cases/units.

A positive correlation between two variables when one is using state-level data may

very well turn out to positive and of the similar magnitude when calculated on individual

persons/cases. Thinking in terms of levels and that the ecological fallacy arises from

some inconsistency or problem across levels helps to better understand when and

how the ecological fallacy occurs.

This is one reason why current approaches to ecological analysis make use of

multilevel modeling. One review of such work in the pharmaceutical area is the

following:

Greenland, Sander (2018) Ecologic Inference. in Chow, S. C. (Ed.). (2018).

Encyclopedia of Biopharmaceutical Statistics-Four Volume Set. CRC Press.

This article is available on books.google.com though the first page is not

available for preview; see:

https://books.google.com/books?id=ERJqDwAAQBAJ&newbks=1&newbks_redir=0&printsec=frontcover&pg=PT1615&dq=multilevel+modeling+ecological+inference&hl=en#v=onepage&q=multilevel%20modeling%20ecological%20inference&f=false

There are many ways that ecological analysis can go wrong but knowing what

the specific problems are (e.g., cross-level interactions that are not modeled,

covariates in different measurement formats, etc.) can help a researcher

achieve more valid conclusions from the data that is available. So, though

the WaPo/Carnegie-Mellon is an ecological correlation it doesn't necessarily

follow that one won't see a negative correlation between percent of 100% mask

use and knowing people with COVID-19 symptoms (and deaths) when

individual person data is used. It is, after all, an empirical question.

-Mike Palij

New York University

[hidden email]

On Fri, Oct 23, 2020 at 3:50 PM Bruce Weaver <[hidden email]> wrote:

Thanks Mike. It's a nice example of an ecological correlation. For some
reason, it reminded me of this other well-known example:

https://urldefense.proofpoint.com/v2/url?u=https-3A__i.insider.com_5353e29b6da8115322dd4816-3Fwidth-3D1000-26format-3Djpeg-26auto-3Dwebp&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=uG6KusBzr-OA1Wny0rmLrT0-S5DKqTszUBXF53aTb3g&e=

Unfortunately, some readers didn't understand that the NEJM article
describing the link between chocolate consumption and Nobel laureates was
meant to be a joke. :-)

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.wbur.org_commonhealth_2012_10_15_nobel-2Dchocolate-2Djoke&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=f_GrBIkg8VFP7E6E2GmIOhiHLa2zHTzj3qCsGlRWU4U&e=

I'm not suggesting the correlation in the Washington Post article is a joke,
by the way. But I am suggesting that we must always be mindful of the
ecological and atomistic fallacies when examining associations at different
levels.

Cheers,
Bruce

Mike wrote
> Note: r = -0.85; the R^2 is provided in the article and I used a
> calculator to
> get the square root and forgot to include the negative sign (as the
> percentage
> of mask users in a state increases, the fewer one knows people with
> COVID-19
> symptoms). Sorry about that.
>
> -Mike Palij
> New York University

> mp26@

>

-----
--
Bruce Weaver
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=Rn7uTOXmL5juye1Y3UL46jUfoHH5c-FTl7SyfBcp7UA&e=

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=byOGLy7wAVV4QNewrknFEsl8rZu6Z5AmfHbf1ZI7mJw&s=KPIUVs9IFyYI9VXKnE3mgyc_FQ0VJUqqtOH9yfDX9do&e=

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Correlational Example Involving COVID-19 Useful for Classes

In reply to this post by Mike

Based strictly on the data presented, one could draw the

arrow of causation in either direction.

Fauci assumes that wearing a mask prevents disease.

Trump might argue that those people who have experience

(illness in friends) realize that those mask-wearing is a result

of the panic caused by the medical profession. People "more

familiar" with the disease do not bother with masks.

For these data, interviews with persons who have switched status

(to or from mask-wearing) would be helpful for interpretation.

Herman Rubin in the stats groups offered the example of the

correlation between the number of trucks responding to a fire

alarm and the cost of the subsequent damage. More respondents

mean ("result in"?) more damage.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Friday, October 23, 2020 1:21 PM
To: [hidden email] <[hidden email]>
Subject: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Note: r = -0.85; the R^2 is provided in the article and I used a calculator to

get the square root and forgot to include the negative sign (as the percentage

of mask users in a state increases, the fewer one knows people with COVID-19

symptoms). Sorry about that.

-Mike Palij

New York University

[hidden email]

This Washington Post article provides an interesting example of correlation

between the "percentage of people one knows with COVID-19 symptoms"

and "percent of people wearing masks in public" -- the unit of analysis is U.S.

states. There is a nice scatterplot and the data used is provided in a table

(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).

I have not checked the Carnegie-Mellon source but they may be more

interesting data/analysis available there.

-Mike Palij

New York University

[hidden email]

Mike

Re: Correlational Example Involving COVID-19 Useful for Classes

Just remember Sir Ronald Fisher's argument against cigarette smoking causing

multiple health problems (e.g., lung cancer, heart disease, etc.) in humans was

that all researchers have was a correlation between amount of cigarette use

and illness condition at some age (usually middle aged and older). Fisher's

justification for his explanation was that people who smoke and went on to

develop, say, lung cancer may have been genetically predisposed to have

lung cancer and, perhaps, a tendency to (enjoy) smoking.

Remember that the evidence for a causal relationship of smoking leading

to lung cancer is based on observational research because it would be

unethical to do a randomized clinical trial (RCT) like the following:

(1) Take several thousand male and females who are randomly assigned

to one of two levels: (a) starting at age 18, equal numbers of males and

femaies to a smoking condition where everyone is required to smoke one

pack of cigarettes per day for an indefinite period of time but on the order

of decades, and (b) also starting at age 18, equal numbers of males and

females are required to be tobacco abstinent and to avoid situations where

one might be exposed to second-hand smoke.

(2) At 5 to 10 year intervals all participants are given medical examinations

to screen for the presence of major diseases/illnesses. 20 years of such

data collection would probably be a minimum but it might best (if funding

can be obtained) to do up to 30 or 40 years of follow-up, at which point

mortality rates may become a more important dependent measure.

(3) Care should be taken to make sure that participants in the two groups

should have similar representation of racial/ethnic groups, SES and education

level, live in similar neighborhoods/environments, etc. Identification of other

relevant variables that might be used as covariates should be an ongoing

process, especially to better understand people who have smoked their

entire life (some into their 90s) but have not developed any significant

health conditions. This group may have a genetically based protection

against the damaging effects of smoking, something compared to people

with HIV infection for decades but do not develop AIDS -- the viral load

is kept very low by the person's immune system, suggesting that some

genetic condition bolsters the resistance to HIV developing into AIDS.

I'm sure that the above design has to be polished up before it would

be a viable undertaking but ethical considerations would probably not

permit such research from ever being done. Which is unfortunate

because this would be an experimental based procedure to establish

a causal link between cigarette smoking and health problems in HUMANS.

So, one can use observational research to address this situation (which

has been the traditional method of studying smoking in humans).

Unfortunately, there are a large number of problems with such research;

for more on this point, see:

Vandenbroucke, J. P., Von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D.,

Pocock, S. J., ... & Strobe Initiative. (2007). Strengthening the Reporting of Observational

Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med, 4(10), e297.

The article can be accessed here:

https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0040297

A somewhat more cynical view of the medical research process is provided by

John P. A. Ioannidis in publications such as the following:

Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.

This article can be accessed at:

https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124&xid=17259,15700019,15700186,15700190,15700248#

So, the contemporary medical understanding and treatment of the effects of cigarette

smoking on human health is not based on RCTs showing a causal relationship between

smoking and human health. RCTs with animals (e.g., smoking vs non-smoking dogs),

appear to support the smoking causes illness proposition but effects shown in animals

don't always transfer to humans -- the animal effect is not replicated in humans where

the effect is weak, nonexistent, or is manifested in different ways. Consequently,

the warrant to claim that cigarette smoking produces serious medical problem has to

be based on (a) observational studies with humans (but with the problem having to explain

why a number of humans do NOT develop illnesses), (b) RCT/experimental studies with

animals to show that they indeed develop serious illness as a function of time spend

smoking, and (c) bench research examining cellular and biochemical effects of the

chemicals (poisons) found in cigarette smoking, trying to determine why and which

physiological systems are being damaged.

To summarize an overlong yadda-yadda: correlations can provide information about

causation (or its absence) but one needs to know a large variety of evidence in order

to make the argument that a correlation means one thing (i.e., smoking is positively

related with the development of medical illness) and not another thing (e.g., genetic

factors that predispose one to develop illnesses [compare to the diathesis-stress

model] may also lead one to engage in smoking but plays a lesser role in the

development of an illness). Experimental designs that can be used to determine

which interpretation of the correlation relationship should be accepted cannot be

done for ethical reasons, so using info from a variety of sources that converge on an

overwhelming conclusion (i.e., smoking causes illness) is the way that the argument

has to be established.

We can leave it as an exercise to interested parties to determine which experimental

design might provide evidence that wearing a face mask reduces the number of

illness/deaths, and the willingness to wear a mask may depend upon a number of

factors, including how many people one knows that had COVID-19 and how bad

a case it was.

On an aside, the argument that people have AGAINST wearing masks reminds me

of the early part of the movie "Aliens" when Ripley is trying to convince the corporate

suits to take seriously the threat that the nearly unstoppable exeomorphs are a danger

not only on LV-426 but to earth as well. And suits being suits, blow Ripley off. At least

until earth is not able to communicate with LV-426 and someone has to go there to find

out why. Ripley understand that if she goes there, she may be walking into a deathtrap

while the Marine and the suit Burke think they can handle anything there. Well, most

of know how that turned out. People who don't wear mask are like the marines and Burke

but have to be hit in the face with a 2x4 to realize how much danger they are in -- it's

not until their first encounter with the aliens that they realized how unprepared they are

to deal with them though Ripley's pleadings tried to get them to understand how bad

it is. Similar to the coronavirus and its resultant illness COVID-19 - until you see how

terrible it can be, one can make believe that the virus is no worse than the flu or that

it is a hoax or it's just an attempt to undermine the president. Sometimes one might

have to let a kid touch a hot pot on a stove to realize that they shouldn't touch hot

pots and pans on stoves. But some kids might need several such learning trials while

a few might turn out to be Darwin award winners. One can always give advice but

it is foolish to expect people to follow it unless they understand what is really going on.

tl;cw

-Mike Palij

New York Univerisity

[hidden email]

P.S. It's late. Sorry about the typos and sentences that appear to suggest that

I had a temporary psychotic break with reality. ;-).

On Sun, Oct 25, 2020 at 1:03 AM Rich Ulrich <[hidden email]> wrote:

Based strictly on the data presented, one could draw the

arrow of causation in either direction.

Fauci assumes that wearing a mask prevents disease.

Trump might argue that those people who have experience

(illness in friends) realize that those mask-wearing is a result

of the panic caused by the medical profession. People "more

familiar" with the disease do not bother with masks.

For these data, interviews with persons who have switched status

(to or from mask-wearing) would be helpful for interpretation.

Herman Rubin in the stats groups offered the example of the

correlation between the number of trucks responding to a fire

alarm and the cost of the subsequent damage. More respondents

mean ("result in"?) more damage.

--

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Friday, October 23, 2020 1:21 PM
To: [hidden email] <[hidden email]>
Subject: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Note: r = -0.85; the R^2 is provided in the article and I used a calculator to

get the square root and forgot to include the negative sign (as the percentage

of mask users in a state increases, the fewer one knows people with COVID-19

symptoms). Sorry about that.

-Mike Palij

New York University

[hidden email]

---------- Forwarded message ---------
From: Michael Palij <[hidden email]>
Date: Fri, Oct 23, 2020 at 1:17 PM
Subject: Correlational Example Involving COVID-19 Useful for Classes
To: SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>

This Washington Post article provides an interesting example of correlation

between the "percentage of people one knows with COVID-19 symptoms"

and "percent of people wearing masks in public" -- the unit of analysis is U.S.

states. There is a nice scatterplot and the data used is provided in a table

(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).

I have not checked the Carnegie-Mellon source but they may be more

interesting data/analysis available there.

https://www.washingtonpost.com/business/2020/10/23/pandemic-data-chart-masks/?utm_campaign=wp_post_most&utm_medium=email&utm_source=newsletter&wpisrc=nl_most&carta-url=https%3A%2F%2Fs2.washingtonpost.com%2Fcar-ln-tr%2F2c4b72b%2F5f92fd6e9d2fda0efb521285%2F5ab230ae9bbc0f2b8372bb9f%2F8%2F71%2F6767f4e2dbc453527192dca03a2a33fc

-Mike Palij

New York University

[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Bruce Weaver

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Administrator

In reply to this post by Mike

Mike wrote
> Thanks for bringing up the correlation between chocolate consumption (CC)
> and the number
> of Nobel Laureates (#NL); I remember when it first came out. However,
> although the correlation
> is between group/aggregate values, I think that this is a better example
> of spurious correlation
> than an ecological correlation.

Fair point.

> --- snip ---
> My understanding of ecological correlation/inference (also known as the
> ecological fallacy) is
> that statistics and relationships based on aggregate/grouped data do not
> necessarily reflect
> the statistics or relationships based on individual level data (or
> whatever the lowest unit of analysis
> is; in the social sciences, this would usually be the person level). The
> Wikipedia entry on the
> Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy
> ) is consistent with
> this view

Agreed. Ditto for the atomistic fallacy, except for the reversed direction
(i.e., associations at the level of individuals do not necessarily match
associations between the same variables at the aggregate level).

> but I think Simpson's Paradox presents the fallacy most directly (see:
> https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ).

Hmm. You're going to have to explain this one to me. Simpson's Paradox is
often illustrated with examples where there appears to be no association
between X and Y, but when one "controls" for Z, the X-Y association becomes
apparent. As this article suggests, it is an example of suppression, or
negative confounding, as epidemiologists might call it:

https://link.springer.com/article/10.1186/s12982-019-0087-0

See the example in Table 1.

Perhaps what you're suggesting is that to get the correct estimate of the
X-Y association, one must compute estimates within each stratum of the
confounder, and then a pooled estimate of those within-stratum estimates
(rather than pooling the data across strata)? I don't see that as being the
same thing as computing the association between aggregate measures of X and
Y, though.

--- snip the rest ---

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver <[hidden email]> wrote:

> --- snip ---
> but I think Simpson's Paradox presents the fallacy most directly (see:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e= ).

Hmm. You're going to have to explain this one to me. Simpson's Paradox is
often illustrated with examples where there appears to be no association
between X and Y, but when one "controls" for Z, the X-Y association becomes
apparent. As this article suggests, it is an example of suppression, or
negative confounding, as epidemiologists might call it:
https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e=

See the example in Table 1.

A few points:

(1) I think that the case you are referring to, i.e., no association between X and Y

when Z is controlled for, is a special case of Simpson's paradox, that is,

sometimes suppression may give rise to the Simpson's paradox but

Simpson's paradox can still occur without suppression. More on this

point shortly.

(2) Please see the following article:

Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's paradox in

psychological science: a practical guide.Frontiers in Psychology, 4, 513.

The article can be accessed at:

https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00513/full

The abstract to the article follows:

The direction of an association at the population-level may be reversed within the subgroups

comprising that population --- a striking observation called Simpson's paradox. When facing this

pattern, psychologists often view it as anomalous. Here, we argue that Simpson's paradox is

more common than conventionally thought, and typically results in incorrect interpretations --

potentially with harmful consequences. We support this claim by reviewing results from cognitive

neuroscience, behavior genetics, clinical psychology, personality psychology, educational psychology,

intelligence research, and simulation studies. We show that Simpson's paradox is most likely to

occur when inferences are drawn across different levels of explanation (e.g., from populations

to subgroups, or subgroups to individuals). We propose a set of statistical markers indicative

of the paradox, and offer psychometric solutions for dealing with the paradox when encountered --

including a toolbox in R for detecting Simpson's paradox. We show that explicit modeling of situations

in which the paradox might occur not only prevents incorrect interpretations of data, but also

results in a deeper understanding of what data tell us about the world.

NOTE: emphasis of the last sentence is added. Modeling the data pattern is important because

of the next point.

(3) On page 6 of the PDF for the article (scroll down on the webpage) the following quote

appears:

A Survival Guide to Simpson's Paradox
We have shown that SP may occur in a wide variety of research designs, methods, and questions.
As such, it would be useful to develop means to “control” or minimize the risk of SP occurring, much
like we wish to control instances of other statistical problems. Pearl (1999, 2000) has shown that
(unfortunately) there is no single mathematical property that all instances of SP have in common, and
therefore, there will not be a single, correct rule for analyzing data so as to prevent cases of SP.
Based on graphical models, Pearl (2000) shows that conditioning on subgroups may sometimes be
appropriate, but may sometimes increase spurious dependencies (see also Spellman et al., 2001).
It appears that some cases are observationally equivalent, and only when it can be assumed that the
cause of interest does not influence another variable associated with the effect, a test exists to determine
whether SP can arise (see Pearl, 2000, chapter 6 for details).

Note #1: Emphasis of the sentence containing Judah Pearl's statement that there is no single math property

that underlie all instances of Simpson's Paradox. This implies that some cases of SP may be due

to suppression but other mechanisms are probably operating to produce the pattern, hence the need

for something like the author's R toolkit to investigate an instance of SP in detail.

Note #2: I think that this article is helpful in thinking about Simpson's Paradox even though most of

the examples are from psychology because it shows how it can appear in a wide variety of situations

(sometimes unnoticed) as well as the difference between SP based on different groups of subjects

and SP based on repeated measurements of individuals in different groups.

Perhaps what you're suggesting is that to get the correct estimate of the
X-Y association, one must compute estimates within each stratum of the
confounder, and then a pooled estimate of those within-stratum estimates
(rather than pooling the data across strata)? I don't see that as being the
same thing as computing the association between aggregate measures of X and
Y, though.
--- snip the rest ---

No, I was trying to suggest that Simpson's paradox may reflect the operation of

different mechanisms which is one reason why I pointed out that multilevel analysis

is one strategy that some researchers are using to understand SP.

-Mike Pallij

New York University

[hidden email]

Bruce Weaver

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Administrator

Thanks for the links, Mike. I see that I also have access to Simpson (1951)
via JSTOR, so when I have time to dig into this a bit more, I'll start with
that.

https://www.jstor.org/stable/2984065?seq=1#metadata_info_tab_contents

Bruce

Mike wrote
> On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver <

> bruce.weaver@

> >
> wrote:
>
>> > --- snip ---
>> > but I think Simpson's Paradox presents the fallacy most directly (see:
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e=
>> ).
>>
>> Hmm. You're going to have to explain this one to me. Simpson's Paradox
>> is
>> often illustrated with examples where there appears to be no association
>> between X and Y, but when one "controls" for Z, the X-Y association
>> becomes
>> apparent. As this article suggests, it is an example of suppression, or
>> negative confounding, as epidemiologists might call it:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e=
>>
>> See the example in Table 1.
>>
>
> A few points:
> (1) I think that the case you are referring to, i.e., no association
> between X and Y
> when Z is controlled for, is a special case of Simpson's paradox, that is,
> sometimes suppression may give rise to the Simpson's paradox but
> Simpson's paradox can still occur without suppression. More on this
> point shortly.
>
> (2) Please see the following article:
> Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's
> paradox in
> psychological science: a practical guide.Frontiers in Psychology, 4, 513.
>
> The article can be accessed at:
> https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00513/full
>
> The abstract to the article follows:
> The direction of an association at the population-level may be reversed
> within the subgroups
> comprising that population --- a striking observation called Simpson's
> paradox. When facing this
> pattern, psychologists often view it as anomalous. Here, we argue that
> Simpson's paradox is
> more common than conventionally thought, and typically results in
> incorrect
> interpretations --
> potentially with harmful consequences. We support this claim by reviewing
> results from cognitive
> neuroscience, behavior genetics, clinical psychology, personality
> psychology, educational psychology,
> intelligence research, and simulation studies. We show that Simpson's
> paradox is most likely to
> occur when inferences are drawn across different levels of explanation
> (e.g., from populations
> to subgroups, or subgroups to individuals). We propose a set of
> statistical
> markers indicative
> of the paradox, and offer psychometric solutions for dealing with the
> paradox when encountered --
> including a toolbox in R for detecting Simpson's paradox.
> *We show that explicit modeling of situations *
>
> *in which the paradox might occur not only prevents incorrect
> interpretations of data, but also *
> *results in a deeper understanding of what data tell us about the world.*
> NOTE: emphasis of the last sentence is added. Modeling the data pattern
> is
> important because
> of the next point.
>
> (3) On page 6 of the PDF for the article (scroll down on the webpage) the
> following quote
> appears:
>
> *A Survival Guide to Simpson's Paradox*
> We have shown that SP may occur in a wide variety of research designs,
> methods, and questions.
> As such, it would be useful to develop means to “control” or minimize the
> risk of SP occurring, much
> like we wish to control instances of other statistical problems.
>
> *Pearl (1999, 2000) has shown that(unfortunately) there is no single
> mathematical property that all instances of SP have in common,
> andtherefore, there will not be a single, correct rule for analyzing data
> so as to prevent cases of SP.*
> Based on graphical models, Pearl (2000) shows that conditioning on
> subgroups may sometimes be
> appropriate, but may sometimes increase spurious dependencies (see also
> Spellman et al., 2001).
> It appears that some cases are observationally equivalent, and only when
> it
> can be assumed that the
> cause of interest does not influence another variable associated with the
> effect, a test exists to determine
> whether SP can arise (see Pearl, 2000, chapter 6 for details).
>
> Note #1: Emphasis of the sentence containing Judah Pearl's statement that
> there is no single math property
> that underlie all instances of Simpson's Paradox. This implies that some
> cases of SP may be due
> to suppression but other mechanisms are probably operating to produce the
> pattern, hence the need
> for something like the author's R toolkit to investigate an instance of SP
> in detail.
>
> Note #2: I think that this article is helpful in thinking about Simpson's
> Paradox even though most of
> the examples are from psychology because it shows how it can appear in a
> wide variety of situations
> (sometimes unnoticed) as well as the difference between SP based on
> different groups of subjects
> and SP based on repeated measurements of individuals in different groups.
>
> Perhaps what you're suggesting is that to get the correct estimate of the
>> X-Y association, one must compute estimates within each stratum of the
>> confounder, and then a pooled estimate of those within-stratum estimates
>> (rather than pooling the data across strata)? I don't see that as being
>> the
>> same thing as computing the association between aggregate measures of X
>> and
>> Y, though.
>> --- snip the rest ---
>>
>
> No, I was trying to suggest that Simpson's paradox may reflect the
> operation of
> different mechanisms which is one reason why I pointed out that multilevel
> analysis
> is one strategy that some researchers are using to understand SP.
>
> -Mike Pallij
> New York University

> mp26@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike

Re: Fwd: Correlational Example Involving COVID-19 Useful for Classes

Bruce, may I suggest that you also read the following by Judea Pearl:

Judea Pearl (2014) Comment: Understanding Simpson’s Paradox, The
American Statistician, 68:1, 8-13, DOI: 10.1080/00031305.2014.876829

I can provide a copy if you need one.

-Mike Palij

New York University

[hidden email]

On Sun, Oct 25, 2020 at 1:24 PM Bruce Weaver <[hidden email]> wrote:

Thanks for the links, Mike. I see that I also have access to Simpson (1951)
via JSTOR, so when I have time to dig into this a bit more, I'll start with
that.

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jstor.org_stable_2984065-3Fseq-3D1-23metadata-5Finfo-5Ftab-5Fcontents&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=bruuLj6QcuwkvBMjesSGJl3MkDjmr49Z8Ff4dl5SR1E&e=

Bruce

Mike wrote
> On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver <

> bruce.weaver@

> >
> wrote:
>
>> > --- snip ---
>> > but I think Simpson's Paradox presents the fallacy most directly (see:
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e=
>> ).
>>
>> Hmm. You're going to have to explain this one to me. Simpson's Paradox
>> is
>> often illustrated with examples where there appears to be no association
>> between X and Y, but when one "controls" for Z, the X-Y association
>> becomes
>> apparent. As this article suggests, it is an example of suppression, or
>> negative confounding, as epidemiologists might call it:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e=
>>
>> See the example in Table 1.
>>
>
> A few points:
> (1) I think that the case you are referring to, i.e., no association
> between X and Y
> when Z is controlled for, is a special case of Simpson's paradox, that is,
> sometimes suppression may give rise to the Simpson's paradox but
> Simpson's paradox can still occur without suppression. More on this
> point shortly.
>
> (2) Please see the following article:
> Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's
> paradox in
> psychological science: a practical guide.Frontiers in Psychology, 4, 513.
>
> The article can be accessed at:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.frontiersin.org_articles_10.3389_fpsyg.2013.00513_full&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=-jx8yrg77I54X2jA4t5za0iutaY8nFjsJ1d7gbut_Ng&e=
>
> The abstract to the article follows:
> The direction of an association at the population-level may be reversed
> within the subgroups
> comprising that population --- a striking observation called Simpson's
> paradox. When facing this
> pattern, psychologists often view it as anomalous. Here, we argue that
> Simpson's paradox is
> more common than conventionally thought, and typically results in
> incorrect
> interpretations --
> potentially with harmful consequences. We support this claim by reviewing
> results from cognitive
> neuroscience, behavior genetics, clinical psychology, personality
> psychology, educational psychology,
> intelligence research, and simulation studies. We show that Simpson's
> paradox is most likely to
> occur when inferences are drawn across different levels of explanation
> (e.g., from populations
> to subgroups, or subgroups to individuals). We propose a set of
> statistical
> markers indicative
> of the paradox, and offer psychometric solutions for dealing with the
> paradox when encountered --
> including a toolbox in R for detecting Simpson's paradox.
> *We show that explicit modeling of situations *
>
> *in which the paradox might occur not only prevents incorrect
> interpretations of data, but also *
> *results in a deeper understanding of what data tell us about the world.*
> NOTE: emphasis of the last sentence is added. Modeling the data pattern
> is
> important because
> of the next point.
>
> (3) On page 6 of the PDF for the article (scroll down on the webpage) the
> following quote
> appears:
>
> *A Survival Guide to Simpson's Paradox*
> We have shown that SP may occur in a wide variety of research designs,
> methods, and questions.
> As such, it would be useful to develop means to “control” or minimize the
> risk of SP occurring, much
> like we wish to control instances of other statistical problems.
>
> *Pearl (1999, 2000) has shown that(unfortunately) there is no single
> mathematical property that all instances of SP have in common,
> andtherefore, there will not be a single, correct rule for analyzing data
> so as to prevent cases of SP.*
> Based on graphical models, Pearl (2000) shows that conditioning on
> subgroups may sometimes be
> appropriate, but may sometimes increase spurious dependencies (see also
> Spellman et al., 2001).
> It appears that some cases are observationally equivalent, and only when
> it
> can be assumed that the
> cause of interest does not influence another variable associated with the
> effect, a test exists to determine
> whether SP can arise (see Pearl, 2000, chapter 6 for details).
>
> Note #1: Emphasis of the sentence containing Judah Pearl's statement that
> there is no single math property
> that underlie all instances of Simpson's Paradox. This implies that some
> cases of SP may be due
> to suppression but other mechanisms are probably operating to produce the
> pattern, hence the need
> for something like the author's R toolkit to investigate an instance of SP
> in detail.
>
> Note #2: I think that this article is helpful in thinking about Simpson's
> Paradox even though most of
> the examples are from psychology because it shows how it can appear in a
> wide variety of situations
> (sometimes unnoticed) as well as the difference between SP based on
> different groups of subjects
> and SP based on repeated measurements of individuals in different groups.
>
> Perhaps what you're suggesting is that to get the correct estimate of the
>> X-Y association, one must compute estimates within each stratum of the
>> confounder, and then a pooled estimate of those within-stratum estimates
>> (rather than pooling the data across strata)? I don't see that as being
>> the
>> same thing as computing the association between aggregate measures of X
>> and
>> Y, though.
>> --- snip the rest ---
>>
>
> No, I was trying to suggest that Simpson's paradox may reflect the
> operation of
> different mechanisms which is one reason why I pointed out that multilevel
> analysis
> is one strategy that some researchers are using to understand SP.
>
> -Mike Pallij
> New York University

> mp26@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=UjmbgE7xX9eYcTeGsD_7xbINg63civa19LPB80Zok08&e=

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=FOEUb4yryPerG_VrcHKoSK109IfB-q-_ac_0ku3suks&s=2eBklti2OfEXBGmuDDVZa4lHHYI4mlP-kcTBtEtrybY&e=

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD