This Washington Post article provides an interesting example of correlation between the "percentage of people one knows with COVID-19 symptoms" and "percent of people wearing masks in public" -- the unit of analysis is U.S. states. There is a nice scatterplot and the data used is provided in a table (comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85). I have not checked the Carnegie-Mellon source but they may be more interesting data/analysis available there. -Mike Palij New York University |
Note: r = -0.85; the R^2 is provided in the article and I used a calculator to get the square root and forgot to include the negative sign (as the percentage of mask users in a state increases, the fewer one knows people with COVID-19 symptoms). Sorry about that. -Mike Palij New York University ---------- Forwarded message --------- From: Michael Palij <[hidden email]> Date: Fri, Oct 23, 2020 at 1:17 PM Subject: Correlational Example Involving COVID-19 Useful for Classes To: SPSS list <[hidden email]> Cc: Michael Palij <[hidden email]> This Washington Post article provides an interesting example of correlation between the "percentage of people one knows with COVID-19 symptoms" and "percent of people wearing masks in public" -- the unit of analysis is U.S. states. There is a nice scatterplot and the data used is provided in a table (comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85). I have not checked the Carnegie-Mellon source but they may be more interesting data/analysis available there. -Mike Palij New York University |
Administrator
|
Thanks Mike. It's a nice example of an ecological correlation. For some
reason, it reminded me of this other well-known example: https://i.insider.com/5353e29b6da8115322dd4816?width=1000&format=jpeg&auto=webp Unfortunately, some readers didn't understand that the NEJM article describing the link between chocolate consumption and Nobel laureates was meant to be a joke. :-) https://www.wbur.org/commonhealth/2012/10/15/nobel-chocolate-joke I'm not suggesting the correlation in the Washington Post article is a joke, by the way. But I am suggesting that we must always be mindful of the ecological and atomistic fallacies when examining associations at different levels. Cheers, Bruce Mike wrote > Note: r = -0.85; the R^2 is provided in the article and I used a > calculator to > get the square root and forgot to include the negative sign (as the > percentage > of mask users in a state increases, the fewer one knows people with > COVID-19 > symptoms). Sorry about that. > > -Mike Palij > New York University > mp26@ > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
"... we must always be mindful of the ecological and atomistic fallacies when examining associations at different levels." Well spoken, Bruce. Cheers, Mario
Am Freitag, 23. Oktober 2020, 21:50:02 MESZ hat Bruce Weaver <[hidden email]> Folgendes geschrieben:
Thanks Mike. It's a nice example of an ecological correlation. For some reason, it reminded me of this other well-known example: https://i.insider.com/5353e29b6da8115322dd4816?width=1000&format=jpeg&auto=webp Unfortunately, some readers didn't understand that the NEJM article describing the link between chocolate consumption and Nobel laureates was meant to be a joke. :-) https://www.wbur.org/commonhealth/2012/10/15/nobel-chocolate-joke I'm not suggesting the correlation in the Washington Post article is a joke, by the way. But I am suggesting that we must always be mindful of the ecological and atomistic fallacies when examining associations at different levels. Cheers, Bruce Mike wrote > Note: r = -0.85; the R^2 is provided in the article and I used a > calculator to > get the square root and forgot to include the negative sign (as the > percentage > of mask users in a state increases, the fewer one knows people with > COVID-19 > symptoms). Sorry about that. > > -Mike Palij > New York University > mp26@ > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
Thanks for bringing up the correlation between chocolate consumption (CC) and the number of Nobel Laureates (#NL); I remember when it first came out. However, although the correlation is between group/aggregate values, I think that this is a better example of spurious correlation than an ecological correlation. It can be argued that the correlation between CC and #NL is dependent on a third variable Z which might be national wealth/GDP, number of graduate degree granting institutions, and/or other variables (or systems of variables) that are causally related to #NL. My understanding of ecological correlation/inference (also known as the ecological fallacy) is that statistics and relationships based on aggregate/grouped data do not necessarily reflect the statistics or relationships based on individual level data (or whatever the lowest unit of analysis is; in the social sciences, this would usually be the person level). The Wikipedia entry on the Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy ) is consistent with this view but I think Simpson's Paradox presents the fallacy most directly (see: <a href="https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox">https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ). However, to make my comments relevant to the WaPo/Carnegie-Mellon correlation between percentage of mask wearers and percentage who knew a person with COVID-19 symptoms, I think a 1999 paper by David Freedman provides a good review of the state of the art in ecological analysis back then (though, in part, it incorporates some previous analysis and writing that is critical of Gary King's 1998 model for ecological analysis; King has updated his model but I'll put that aside for now). The reference for the Freedman article is: Freedman, D. A. (1999). Ecological inference and the ecological fallacy. International Encyclopedia of the Social & Behavioral Sciences, 6(4027-4030), 1-7. And a Pdf can be accessed at: Freedman reviews some of the procedures that have been developed over the decades to perform ecological analysis and how to determine whether such analysis is valid. Remember that the ecological fallacy does not always occur, that is, the results seen with aggregate data may well be consistent with analysis based on individual cases/units. A positive correlation between two variables when one is using state-level data may very well turn out to positive and of the similar magnitude when calculated on individual persons/cases. Thinking in terms of levels and that the ecological fallacy arises from some inconsistency or problem across levels helps to better understand when and how the ecological fallacy occurs. This is one reason why current approaches to ecological analysis make use of multilevel modeling. One review of such work in the pharmaceutical area is the following: Greenland, Sander (2018) Ecologic Inference. in Chow, S. C. (Ed.). (2018). Encyclopedia of Biopharmaceutical Statistics-Four Volume Set. CRC Press. This article is available on books.google.com though the first page is not available for preview; see: There are many ways that ecological analysis can go wrong but knowing what the specific problems are (e.g., cross-level interactions that are not modeled, covariates in different measurement formats, etc.) can help a researcher achieve more valid conclusions from the data that is available. So, though the WaPo/Carnegie-Mellon is an ecological correlation it doesn't necessarily follow that one won't see a negative correlation between percent of 100% mask use and knowing people with COVID-19 symptoms (and deaths) when individual person data is used. It is, after all, an empirical question. -Mike Palij New York University On Fri, Oct 23, 2020 at 3:50 PM Bruce Weaver <[hidden email]> wrote: Thanks Mike. It's a nice example of an ecological correlation. For some |
In reply to this post by Mike
Based strictly on the data presented, one could draw the
arrow of causation in either direction.
Fauci assumes that wearing a mask prevents disease.
Trump might argue that those people who have experience
(illness in friends) realize that those mask-wearing is a result
of the panic caused by the medical profession. People "more
familiar" with the disease do not bother with masks.
For these data, interviews with persons who have switched status
(to or from mask-wearing) would be helpful for interpretation.
Herman Rubin in the stats groups offered the example of the
correlation between the number of trucks responding to a fire
alarm and the cost of the subsequent damage. More respondents
mean ("result in"?) more damage.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Michael Palij <[hidden email]>
Sent: Friday, October 23, 2020 1:21 PM To: [hidden email] <[hidden email]> Subject: Fwd: Correlational Example Involving COVID-19 Useful for Classes Note: r = -0.85; the R^2 is provided in the article and I used a calculator to
get the square root and forgot to include the negative sign (as the percentage
of mask users in a state increases, the fewer one knows people with COVID-19
symptoms). Sorry about that.
-Mike Palij
New York University
---------- Forwarded message ---------
From: Michael Palij <[hidden email]> Date: Fri, Oct 23, 2020 at 1:17 PM Subject: Correlational Example Involving COVID-19 Useful for Classes To: SPSS list <[hidden email]> Cc: Michael Palij <[hidden email]> This Washington Post article provides an interesting example of correlation
between the "percentage of people one knows with COVID-19 symptoms"
and "percent of people wearing masks in public" -- the unit of analysis is U.S.
states. There is a nice scatterplot and the data used is provided in a table
(comes from Delphi CPVODCast, Carnegie-Mellon U). The R^2 is 0.72 (r = 0.85).
I have not checked the Carnegie-Mellon source but they may be more
interesting data/analysis available there.
-Mike Palij
New York University
|
Just remember Sir Ronald Fisher's argument against cigarette smoking causing multiple health problems (e.g., lung cancer, heart disease, etc.) in humans was that all researchers have was a correlation between amount of cigarette use and illness condition at some age (usually middle aged and older). Fisher's justification for his explanation was that people who smoke and went on to develop, say, lung cancer may have been genetically predisposed to have lung cancer and, perhaps, a tendency to (enjoy) smoking. Remember that the evidence for a causal relationship of smoking leading to lung cancer is based on observational research because it would be unethical to do a randomized clinical trial (RCT) like the following: (1) Take several thousand male and females who are randomly assigned to one of two levels: (a) starting at age 18, equal numbers of males and femaies to a smoking condition where everyone is required to smoke one pack of cigarettes per day for an indefinite period of time but on the order of decades, and (b) also starting at age 18, equal numbers of males and females are required to be tobacco abstinent and to avoid situations where one might be exposed to second-hand smoke. (2) At 5 to 10 year intervals all participants are given medical examinations to screen for the presence of major diseases/illnesses. 20 years of such data collection would probably be a minimum but it might best (if funding can be obtained) to do up to 30 or 40 years of follow-up, at which point mortality rates may become a more important dependent measure. (3) Care should be taken to make sure that participants in the two groups should have similar representation of racial/ethnic groups, SES and education level, live in similar neighborhoods/environments, etc. Identification of other relevant variables that might be used as covariates should be an ongoing process, especially to better understand people who have smoked their entire life (some into their 90s) but have not developed any significant health conditions. This group may have a genetically based protection against the damaging effects of smoking, something compared to people with HIV infection for decades but do not develop AIDS -- the viral load is kept very low by the person's immune system, suggesting that some genetic condition bolsters the resistance to HIV developing into AIDS. I'm sure that the above design has to be polished up before it would be a viable undertaking but ethical considerations would probably not permit such research from ever being done. Which is unfortunate because this would be an experimental based procedure to establish a causal link between cigarette smoking and health problems in HUMANS. So, one can use observational research to address this situation (which has been the traditional method of studying smoking in humans). Unfortunately, there are a large number of problems with such research; for more on this point, see: Vandenbroucke,
J. P., Von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D., Pocock, S. J., ... & Strobe Initiative. (2007). Strengthening the
Reporting of Observational Studies in Epidemiology (STROBE): explanation
and elaboration. PLoS Med, 4(10), e297. The article can be accessed here: A somewhat more cynical view of the medical research process is provided by
John P. A. Ioannidis in publications such as the following: Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.
This article can be accessed at: So, the contemporary medical understanding and treatment of the effects of cigarette smoking on human health is not based on RCTs showing a causal relationship between smoking and human health. RCTs with animals (e.g., smoking vs non-smoking dogs), appear to support the smoking causes illness proposition but effects shown in animals don't always transfer to humans -- the animal effect is not replicated in humans where the effect is weak, nonexistent, or is manifested in different ways. Consequently, the warrant to claim that cigarette smoking produces serious medical problem has to be based on (a) observational studies with humans (but with the problem having to explain why a number of humans do NOT develop illnesses), (b) RCT/experimental studies with animals to show that they indeed develop serious illness as a function of time spend smoking, and (c) bench research examining cellular and biochemical effects of the chemicals (poisons) found in cigarette smoking, trying to determine why and which physiological systems are being damaged. To summarize an overlong yadda-yadda: correlations can provide information about causation (or its absence) but one needs to know a large variety of evidence in order to make the argument that a correlation means one thing (i.e., smoking is positively related with the development of medical illness) and not another thing (e.g., genetic factors that predispose one to develop illnesses [compare to the diathesis-stress model] may also lead one to engage in smoking but plays a lesser role in the development of an illness). Experimental designs that can be used to determine which interpretation of the correlation relationship should be accepted cannot be done for ethical reasons, so using info from a variety of sources that converge on an overwhelming conclusion (i.e., smoking causes illness) is the way that the argument has to be established. We can leave it as an exercise to interested parties to determine which experimental design might provide evidence that wearing a face mask reduces the number of illness/deaths, and the willingness to wear a mask may depend upon a number of factors, including how many people one knows that had COVID-19 and how bad a case it was. On an aside, the argument that people have AGAINST wearing masks reminds me of the early part of the movie "Aliens" when Ripley is trying to convince the corporate suits to take seriously the threat that the nearly unstoppable exeomorphs are a danger not only on
LV-426 but to earth as well. And suits being suits, blow Ripley off. At least until earth is not able to communicate with LV-426 and someone has to go there to find out why. Ripley understand that if she goes there, she may be walking into a deathtrap while the Marine and the suit Burke think they can handle anything there. Well, most of know how that turned out. People who don't wear mask are like the marines and Burke but have to be hit in the face with a 2x4 to realize how much danger they are in -- it's not until their first encounter with the aliens that they realized how unprepared they are to deal with them though Ripley's pleadings tried to get them to understand how bad it is. Similar to the coronavirus and its resultant illness COVID-19 - until you see how terrible it can be, one can make believe that the virus is no worse than the flu or that it is a hoax or it's just an attempt to undermine the president. Sometimes one might have to let a kid touch a hot pot on a stove to realize that they shouldn't touch hot pots and pans on stoves. But some kids might need several such learning trials while a few might turn out to be Darwin award winners. One can always give advice but it is foolish to expect people to follow it unless they understand what is really going on. tl;cw -Mike Palij New York Univerisity P.S. It's late. Sorry about the typos and sentences that appear to suggest that I had a temporary psychotic break with reality. ;-). On Sun, Oct 25, 2020 at 1:03 AM Rich Ulrich <[hidden email]> wrote:
|
Administrator
|
In reply to this post by Mike
Mike wrote
> Thanks for bringing up the correlation between chocolate consumption (CC) > and the number > of Nobel Laureates (#NL); I remember when it first came out. However, > although the correlation > is between group/aggregate values, I think that this is a better example > of spurious correlation > than an ecological correlation. Fair point. > --- snip --- > My understanding of ecological correlation/inference (also known as the > ecological fallacy) is > that statistics and relationships based on aggregate/grouped data do not > necessarily reflect > the statistics or relationships based on individual level data (or > whatever the lowest unit of analysis > is; in the social sciences, this would usually be the person level). The > Wikipedia entry on the > Ecological Fallacy (see: https://en.wikipedia.org/wiki/Ecological_fallacy > ) is consistent with > this view Agreed. Ditto for the atomistic fallacy, except for the reversed direction (i.e., associations at the level of individuals do not necessarily match associations between the same variables at the aggregate level). > but I think Simpson's Paradox presents the fallacy most directly (see: > https://en.wikipedia.org/wiki/Ecological_fallacy#Simpson's_paradox ). Hmm. You're going to have to explain this one to me. Simpson's Paradox is often illustrated with examples where there appears to be no association between X and Y, but when one "controls" for Z, the X-Y association becomes apparent. As this article suggests, it is an example of suppression, or negative confounding, as epidemiologists might call it: https://link.springer.com/article/10.1186/s12982-019-0087-0 See the example in Table 1. Perhaps what you're suggesting is that to get the correct estimate of the X-Y association, one must compute estimates within each stratum of the confounder, and then a pooled estimate of those within-stratum estimates (rather than pooling the data across strata)? I don't see that as being the same thing as computing the association between aggregate measures of X and Y, though. --- snip the rest --- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver <[hidden email]> wrote: > --- snip --- A few points: (1) I think that the case you are referring to, i.e., no association between X and Y when Z is controlled for, is a special case of Simpson's paradox, that is, sometimes suppression may give rise to the Simpson's paradox but Simpson's paradox can still occur without suppression. More on this point shortly. (2) Please see the following article: Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's paradox in psychological science: a practical guide.Frontiers in Psychology, 4, 513. The article can be accessed at: The abstract to the article follows: The direction of an association at the population-level may be reversed within the subgroups comprising that population --- a striking observation called Simpson's paradox. When facing this pattern, psychologists often view it as anomalous. Here, we argue that Simpson's paradox is more common than conventionally thought, and typically results in incorrect interpretations -- potentially with harmful consequences. We support this claim by reviewing results from cognitive neuroscience, behavior genetics, clinical psychology, personality psychology, educational psychology, intelligence research, and simulation studies. We show that Simpson's paradox is most likely to occur when inferences are drawn across different levels of explanation (e.g., from populations to subgroups, or subgroups to individuals). We propose a set of statistical markers indicative of the paradox, and offer psychometric solutions for dealing with the paradox when encountered -- including a toolbox in R for detecting Simpson's paradox. We show that explicit modeling of situations in which the paradox might occur not only prevents incorrect interpretations of data, but also results in a deeper understanding of what data tell us about the world. NOTE: emphasis of the last sentence is added. Modeling the data pattern is important because of the next point. (3) On page 6 of the PDF for the article (scroll down on the webpage) the following quote appears:
A Survival Guide to Simpson's Paradox We have shown that SP may occur in a wide variety of research designs, methods, and questions. As such, it would be useful to develop means to “control” or minimize the risk of SP occurring, much like we wish to control instances of other statistical problems. Pearl (1999, 2000) has shown that (unfortunately) there is no single mathematical property that all instances of SP have in common, and therefore, there will not be a single, correct rule for analyzing data so as to prevent cases of SP. Based on graphical models, Pearl (2000) shows that conditioning on subgroups may sometimes be appropriate, but may sometimes increase spurious dependencies (see also Spellman et al., 2001). It appears that some cases are observationally equivalent, and only when it can be assumed that the cause of interest does not influence another variable associated with the effect, a test exists to determine whether SP can arise (see Pearl, 2000, chapter 6 for details). Note #1: Emphasis of the sentence containing Judah Pearl's statement that there is no single math property that underlie all instances of Simpson's Paradox. This implies that some cases of SP may be due to suppression but other mechanisms are probably operating to produce the pattern, hence the need for something like the author's R toolkit to investigate an instance of SP in detail. Note #2: I think that this article is helpful in thinking about Simpson's Paradox even though most of the examples are from psychology because it shows how it can appear in a wide variety of situations (sometimes unnoticed) as well as the difference between SP based on different groups of subjects and SP based on repeated measurements of individuals in different groups. Perhaps what you're suggesting is that to get the correct estimate of the No, I was trying to suggest that Simpson's paradox may reflect the operation of different mechanisms which is one reason why I pointed out that multilevel analysis is one strategy that some researchers are using to understand SP. -Mike Pallij New York University |
Administrator
|
Thanks for the links, Mike. I see that I also have access to Simpson (1951)
via JSTOR, so when I have time to dig into this a bit more, I'll start with that. https://www.jstor.org/stable/2984065?seq=1#metadata_info_tab_contents Bruce Mike wrote > On Sun, Oct 25, 2020 at 10:55 AM Bruce Weaver < > bruce.weaver@ > > > wrote: > >> > --- snip --- >> > but I think Simpson's Paradox presents the fallacy most directly (see: >> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Ecological-5Ffallacy-23Simpson-27s-5Fparadox&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=sDi6eBkVmVauFo92kBooAiYs9NvPwMPBB0WifOkOGaY&e= >> ). >> >> Hmm. You're going to have to explain this one to me. Simpson's Paradox >> is >> often illustrated with examples where there appears to be no association >> between X and Y, but when one "controls" for Z, the X-Y association >> becomes >> apparent. As this article suggests, it is an example of suppression, or >> negative confounding, as epidemiologists might call it: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__link.springer.com_article_10.1186_s12982-2D019-2D0087-2D0&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=300e3cAU3FH4Wnoe4MY_n1Jmt3K-xsHo9cXm3y8sse0&s=4zYrM0WEJGa9AyckGqWEStFDWkuDwpHt5FoH70LHtvQ&e= >> >> See the example in Table 1. >> > > A few points: > (1) I think that the case you are referring to, i.e., no association > between X and Y > when Z is controlled for, is a special case of Simpson's paradox, that is, > sometimes suppression may give rise to the Simpson's paradox but > Simpson's paradox can still occur without suppression. More on this > point shortly. > > (2) Please see the following article: > Kievit, R., Frankenhuis W., Waldorp L., & Borsboom, D. (2013). Simpson's > paradox in > psychological science: a practical guide.Frontiers in Psychology, 4, 513. > > The article can be accessed at: > https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00513/full > > The abstract to the article follows: > The direction of an association at the population-level may be reversed > within the subgroups > comprising that population --- a striking observation called Simpson's > paradox. When facing this > pattern, psychologists often view it as anomalous. Here, we argue that > Simpson's paradox is > more common than conventionally thought, and typically results in > incorrect > interpretations -- > potentially with harmful consequences. We support this claim by reviewing > results from cognitive > neuroscience, behavior genetics, clinical psychology, personality > psychology, educational psychology, > intelligence research, and simulation studies. We show that Simpson's > paradox is most likely to > occur when inferences are drawn across different levels of explanation > (e.g., from populations > to subgroups, or subgroups to individuals). We propose a set of > statistical > markers indicative > of the paradox, and offer psychometric solutions for dealing with the > paradox when encountered -- > including a toolbox in R for detecting Simpson's paradox. > *We show that explicit modeling of situations * > > *in which the paradox might occur not only prevents incorrect > interpretations of data, but also * > *results in a deeper understanding of what data tell us about the world.* > NOTE: emphasis of the last sentence is added. Modeling the data pattern > is > important because > of the next point. > > (3) On page 6 of the PDF for the article (scroll down on the webpage) the > following quote > appears: > > *A Survival Guide to Simpson's Paradox* > We have shown that SP may occur in a wide variety of research designs, > methods, and questions. > As such, it would be useful to develop means to “control” or minimize the > risk of SP occurring, much > like we wish to control instances of other statistical problems. > > *Pearl (1999, 2000) has shown that(unfortunately) there is no single > mathematical property that all instances of SP have in common, > andtherefore, there will not be a single, correct rule for analyzing data > so as to prevent cases of SP.* > Based on graphical models, Pearl (2000) shows that conditioning on > subgroups may sometimes be > appropriate, but may sometimes increase spurious dependencies (see also > Spellman et al., 2001). > It appears that some cases are observationally equivalent, and only when > it > can be assumed that the > cause of interest does not influence another variable associated with the > effect, a test exists to determine > whether SP can arise (see Pearl, 2000, chapter 6 for details). > > Note #1: Emphasis of the sentence containing Judah Pearl's statement that > there is no single math property > that underlie all instances of Simpson's Paradox. This implies that some > cases of SP may be due > to suppression but other mechanisms are probably operating to produce the > pattern, hence the need > for something like the author's R toolkit to investigate an instance of SP > in detail. > > Note #2: I think that this article is helpful in thinking about Simpson's > Paradox even though most of > the examples are from psychology because it shows how it can appear in a > wide variety of situations > (sometimes unnoticed) as well as the difference between SP based on > different groups of subjects > and SP based on repeated measurements of individuals in different groups. > > Perhaps what you're suggesting is that to get the correct estimate of the >> X-Y association, one must compute estimates within each stratum of the >> confounder, and then a pooled estimate of those within-stratum estimates >> (rather than pooling the data across strata)? I don't see that as being >> the >> same thing as computing the association between aggregate measures of X >> and >> Y, though. >> --- snip the rest --- >> > > No, I was trying to suggest that Simpson's paradox may reflect the > operation of > different mechanisms which is one reason why I pointed out that multilevel > analysis > is one strategy that some researchers are using to understand SP. > > -Mike Pallij > New York University > mp26@ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce, may I suggest that you also read the following by Judea Pearl: Judea Pearl (2014) Comment: Understanding Simpson’s Paradox, The American Statistician, 68:1, 8-13, DOI: 10.1080/00031305.2014.876829 I can provide a copy if you need one. -Mike Palij New York University On Sun, Oct 25, 2020 at 1:24 PM Bruce Weaver <[hidden email]> wrote: Thanks for the links, Mike. I see that I also have access to Simpson (1951) |
Free forum by Nabble | Edit this page |