Why Correlation Usually ≠ Causation

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
  W HY  C ORRELATION  U SUALLY  ≠C AUSATION : C AUSAL  N ETS  C AUSE C OMMON  C ONFOUNDING Correlations are oft interpreted as evidence for causation; this is oft falsified; do causal graphs explain why this is so common?  topics: statistics , philosophy , survey , Bayes  created: 24  Jun 2014 ; modified: 18 Apr 2019 ; status: in progress ; confidence:  log ; importance:  10 ã Confound it! Correlation is (usually) not causation! Butwhy not? ã  The Problem ã What a Tangled Net We Weave When First We Practice to Believe ã Comment ã Heuristics & Biases ã External links ã Appendix ã Everything correlates with everything It is widely understood that statistical correlation between two variables ≠ causation. But despite this admonition, people areroutinely overconfident in claiming correlations to support particularcausal interpretations and are surprised by the results of randomizedexperiments, suggesting that they are biased & systematicallyunderestimating the prevalence of confounds/common-causation. Ispeculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causalrelationships. So confounds really are that common, and since peopledo not think in DAGs, the imbalance also explains overconfidence.   HomeSite M eNew: mail / R/ GW E RN support onPATREON  I have noticed I seem to be unusually willing to bite the correlation≠causationbullet, and I think it’s due to an idea I had some time ago about the nature of reality. Confound it! Correlation is (usually) notcausation! But why not? THE PROBLEM “Hubris is the greatest danger that accompanies formal dataanalysis…Let me lay down a few basics, none of which is easy for all toaccept… 1. The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that areasonable answer can be extracted from a given body of data.” John Tukey (pg74-75, “Sunset Salvo” 1986)  Most scientifically-inclined people are reasonably aware that one of the majordivides in research is that correlation≠causation: that having discovered somerelationship between various data  X   and Y   (not necessarily Pearson’s r , but anysort of mathematical or statistical relationship, whether it be a humble r  or anopaque deep neural network’s predictions), we do not know how Y   wouldchange if we manipulated  X  . Y   might increase, decrease, do somethingcomplicated, or remain implacably the same. This point can be made by listingexamples of correlations where we intuitively know changing  X   should have noeffect on Y  , and it’s a spurious relationship: the number of churches in a townmay correlate with the number of bars, but we know that’s because both arerelated to how many people are in it; the number of pirates may inverselycorrelate with global temperatures (but we know pirates don’t control globalwarming and it’s more likely something like economic development leads tosuppression of piracy but also CO2 emissions); sales of ice cream may correlatewith snake bites or violent crime or death from heat-strokes (but of coursesnakes don’t care about sabotaging ice cream sales); thin people may have betterposture than fat people, but sitting upright does not seem like a plausible weightloss plan 1 ; wearing XXXL clothing clearly doesn’t cause  heart attacks, althoughone might wonder if diet soda causes obesity; the more firemen are around, theworse fires are; judging by grades of tutored vs non-tutored students, tutorswould seem to be harmful rather than helpful; black skin does not cause sicklecell anemia nor, to borrow an example from Pearson 2 , would black skin causesmallpox or malaria; more recently, part of the psychology behind linking vaccines with autism is that many vaccines are administered to children at thesame time autism would start becoming apparent (or should we blame organicfood sales?); height & vocabulary or foot size & math skills may correlatestrongly (in children); national chocolate consumption correlates with Nobelprizes 3 , as do borrowing from commercial banks & buying luxury cars & serial killers/mass-murderers/traffic fatalities 4 ; moderate alcohol consumptionpredicts increased  lifespan and earnings; the role of storks in delivering babies may have been underestimated; children and people with high self-esteem have higher grades & lower crime rates etc, so “we all know in our gutthat it’s true” that raising people’s self-esteem “empowers us to live responsiblyand that inoculates us against the lures of crime, violence, substance abuse, teenpregnancy, child abuse, chronic welfare dependency and educational failure”- unless perhaps high self-esteem is caused by  high grades & success, boostingself-esteem has no experimental benefits, and may backfire?Now, the correlation could  be bogus in the sense that it would disappear if wegathered more data, and was an illusory correlation due to biases; or it could bean artifact of our mathematical procedures as in “spurious correlations”; or it is aType I error, a correlation thrown up by the standard statistical problems we allknow about, such as too-small n , false positives from sampling error (A & B justhappened to sync together due to randomness), data-mining/multiple testing, p -  hacking, data snooping, selection bias, publication bias, misconduct,inappropriate statistical tests, etc. Those last can be generated ad nauseam:Shaun Gallagher’s Correlated (also a book) surveys users & compares against all previous surveys with 1k+ correlations. Tyler Vigen’s “spuriouscorrelations” catalogues 35k+ correlations, many with r >0.9, based primarily on US Census & CDC data. Google Correlate “finds Google search query patternswhich correspond with real-world trends” based on geography or user-provideddata, which offers endless fun (“Facebook”/“tapeworm in humans”, r =0.8721;“Superfreakonomic”/“Windows 7 advisor”, r =0.9751; Irish electricityprices/“Stanford webmail”, r =0.83; “heart attack”/“pink lace dress”, r =0.88; USstates’ parasite loads/“booty models”, r =0.92; US states’ family ties/“how toswim”; metronidazole/“Is Lil’ Wayne gay?”, r =0.89; Clojure/“prnhub”, r =0.9784;“accident”/“itchy bumps”, r =0.87; “migraine headaches”/“sciences”, r =0.77;“Irritable Bowel Syndrome”/“font download”, r =0.94; interest-rate-index/“pillidentification”, r =0.98; “advertising”/“medical research”, r =0.99; Barack Obama2012 vote-share/“Top Chef”, r =0.88; “losing weight”/“houses for rent”, r =0.97;“Bieber”/tonsillitis, r =0.95; “paternity test”/“food for dogs”, r =0.83; “breastenlargement”/“reverse telephone search”, r =0.95; “theory of evolution” / “theSumerians” or “Hector of Troy” or “Jim Crow laws”; “gwern”/“Danny Brownlyrics”, r =0.92; “weed”/“new Family Guy  episodes”, r =0.8; a drawing of a bell curve matches “MySpace” while a penis matches “STD symptoms in men”  r =0.95,not to mention Kurt Vonnegut stories). (And on less secular themes, do churches cause obesity & do Welsh rugby victories predict papal deaths?) Financial data- mining offers some fun examples; there’s the Super Bowl/stock-marketone which worked well for several decades; and it’s not very elegant, but a 3- variable model (Bangladeshi butter, American cheese, joint sheep population)reaches R2=0.99 on 20 years of the S&P 500I’ve read about those problems at length, and despite knowing about all that,there still seems to be a problem: I don’t think those issues explain away all thecorrelations which turn out to be confounds - correlation too often  ≠ causation.  One of the constant problems I face in my reading is that I constantly want toknow about causal  relationships but I only have correlational  data, and as we allknow, that is an unreliable guide at best.The unreliability is bad enough, but I’m also worried that the knowledgecorrelation≠causation, one of the core ideas of the scientific method andfundamental to fields like modern medicine, is going underappreciated and isbeing abandoned by meta-contrarians as being “nothing helpful” or“meaningless” and justified skepticism is actually just “a dumb-ass thing to say”,a “statistical cliché that closes threads and ends debates, the freshman platitudeturned final shutdown” often used by “party poopers” “Internet blowhards” to serve an “agenda” & is sometimes “a dog whistle”; in practice, such people seemto go well beyond the  XKCD  comic and proceed to take any correlations they likeas strong evidence for causation, and any disagreement reveals one’sunsophisticated middlebrow thinking or denialism. So it’s unsurprising that oneso often runs into researchers for whom indeed correlation=causation; it iscommon to use causal language and make recommendations (Prasad et al 2013),but even if they don’t, you can be sure to see them confidently talking causally toother researchers or journalists or officials. (I’ve noticed this sort of constantslide is particularly common in medicine, sociology, and education.)Bandying phrases with meta-contrarians won’t help much here; I agree withthem that correlation ought to be some  evidence for causation. eg if I suspect thatA→B, and I collect data and establish beyond doubt that A&B correlates r =0.7,surely this observations, which is consistent with my theory, should boost myconfidence in my theory, just as an observation like r =0.0001 would trouble megreatly. But how much…?To measure this directly you need a clear set of correlations which are proposedto be causal, randomized experiments to establish what the true causalrelationship is in each case, and both categories need to be sharply delineated inadvance to avoid issues of cherrypicking and retroactively confirming acorrelation. Then you’d be able to say something like ‘11 out of the 100 proposedA→B causal relationships panned out’, and start with a prior of 11% that in yourcase, A→B. This sort of dataset is pretty rare, although the few examples I’ve
Related Search
Similar documents
View more
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks