Select Page

## Hilarious Graphs (and Pirates) Prove That Correlation Is Not Causation

Hilarious Graphs (and Pirates) Prove That Correlation Is Not Causation

When it comes to storytelling, we have a problem.

It’s not our fault though – as human beings we are hard-wired from birth to look for patterns and explain why they happen. This problem doesn’t go away when we grow up though, it becomes worse the more intelligent we think we are. We convince ourselves that now we are older, wiser, smarter, that our conclusions are closer to the mark than when we were younger (the faster the wind blows the faster the windmill blades turn, not the other way around).

Even really smart people see a pattern and insist on putting an explanation to it, even when they don’t have enough information to reach such a conclusion. They can’t help it.

This is the thing about being human. We seek explanation for the events that happen around us. If something defies logic, we try to find a reason why it might make sense. If something doesn’t add up, we make it up.

The Post Hoc Fallacy
Ever heard the Latin expression Post Hoc, Ergo Propter Hoc, meaning ‘After this, therefore because of this’? Of course you have – it is the basis of the saying ‘Correlation Does Not Imply Causation’. It is also known in statistics as the Post Hoc Fallacy, and is a very familiar trap that we all fall into from time to time. This is the idea that when things are observed to happen in sequence, we infer that the thing that happened first must have caused the thing that happened next.

The Post Hoc Fallacy is what causes a football manager to only wear purple socks on match days. He once wore them at a match and his team won. Obviously, it was the socks that did it. Now he fears that if doesn’t wear them to a match the team might lose. Damn those stinky purple socks (he also daren’t wash them for fear of the magic pixie dust washing out).

Post Hoc is also what made rain men indispensible to the tribe – they believed that their rain man can make it rain. Spotting the clouds brewing in the distance, the rain man dances until it pours it down. It doesn’t usually take more than three or four days of dancing until the inevitable happens. “Rain man dance, water fall from sky”. It’s just a good job for the rain man that the Indians couldn’t speak Latin, otherwise he’d have been in real trouble…

The Post Hoc Piracy Postulation
For a humorous view of the Post Hoc Fallacy, let’s take a look at Pastafarianism. It’s all the rage these days. Not heard of it? It’s one of the newest and fastest growing religions on the block. Pastafarian Sparrowism, to give it its full title, is a ‘vibrant religion that seeks to bring the Flying Spaghetti Monster’s fleeting affection to all of us, through the life of His Prophet, Captain Jack Sparrow’. Seriously, they’re not joking. Well, actually, they are. They promote a light-hearted view of religion and oppose the teaching of intelligent design and creationism in public schools. They also maintain that pirates are the original Pastafarians.

In an effort to illustrate that correlation does not imply causation, the founder, Bobby Henderson, presented the argument that global warming is a direct effect of the shrinking number of pirates since the 1800s, and accompanied it with this graph:

Pirates Caused Global Warming. Honest…

Wow, look at that straight line, I hear you all say – there’s clearly a correlation between the decline in the numbers of pirates and the rise in global temperatures, so there just must be a causal connection here, mustn’t there? Yup, you’ve all just fallen for the Post Hoc Fallacy (I just knew you would).

Just because there is a straight line on the graph it doesn’t necessarily follow that one thing caused the other, particularly when you’ve grabbed two seemingly unconnected variables at random and stuck them together to see whether there might be some sort of tenuous correlation between them.

In the case of pirates and global warming, take a closer look at the labels on the x-axis. Notice something strange? Apart from the fact that the proportions of neighbouring data points are all out of whack, there is also the issue that a couple of them have been humorously disordered to deliberately deceive.

I don’t know about you, but I’m a believer! As soon as I’ve hit the ‘Publish’ button I’m giving up stats for a life as a pirate on the open seas. I’ll stop global warming if it’s the last thing I do.

It probably will be…

This blog post is an extract from the witty new book Truth, Lies and Statistics, FREE at Amazon

Here’s the blurb:

Pirates, cats, Mexican lemons and North Carolina lawyers.
Cheese consumption, margarine and drowning by falling out of fishing boats.

This book has got it all.

In this eye-opening book, award winning statistician and author Lee Baker uncovers the key tricks used by statistical hustlers to deceive, hoodwink and dupe the unwary.

Written as a layman’s guide to lying, cheating and deceiving with data and statistics, there’s not a dull page in sight!

A roller coaster of a book in 8 witty chapters, this might just be the most entertaining statistics book you’ll read this year.

Discover the exciting world of statistical cheating and persuasive misdirection.

The Organic Autism Correlation Conundrum
If you look online there are all sorts of humorous graphs that prove the Post Hoc Fallacy. Over the past 20 years or so, there’s been a huge increase in the anti-vaccine movement, particularly in the US, and there have been all sorts of spurious correlations that have been ‘discovered’ that ‘prove’ that there is a causal link between vaccination programmes and autism. At the same time, to debunk the most crackpot of the theories, other – equally ridiculous – correlations have popped up too.

There was one that was published that showed the correlation between sales of organic food in the US and diagnosis of autism:

Organic Food Causes Autism. Oh My…

There is a very close correlation between the pair of plot lines, even accompanied by a very large r-value (close to 1) and a very small p-value (close to 0). The suggestion is that – if we trust that correlation does imply causation – a much closer correlation exists between organic food and autism than any other theory that currently exists, so therefore it must be the cause. Except that correlation does not necessarily imply causation, and organic food does not cause autism. That would be ridiculous. And that is the whole point of these graphs. All you need to do is find any pair of variables that increase over the same time period, plot them on a graph with the same x-axis and different y-axes, adjust the y-axis scales until the plot lines coalesce, and – BOOM – correlation! If, by some magic of coincidence and fate, there is a statistical correlation, then publish the p-value that goes along with it as additional proof. What this does is prove that the correlation exists, but it does not prove that one thing causes the other. It might, but then again it might not…

The Lemon Fatality Correlation Convergence
I also quite enjoyed the correlation that proved that Mexican lemons are a major cause of deaths on US roads. Wait, what? I must have missed the news that day – Mexican lemons are killing Americans? You bet!

Take a look at a plot of the number of fresh lemons imported into the USA from Mexico versus the total fatality rate on US highways between 1996 and 2000:

Mexican Lemons Kill Americans!

My, my, just look at the R-squared value – it really must be true. Although the graph seems to be telling us that the more Mexican lemons there are in the US the fewer road deaths there are, the inescapable conclusion is that MEXICAN LEMONS KILL AMERICANS! What should we do about it? Should we import more Mexican lemons (the correlation tells us that this is what we should do)? Or should we ban Mexican lemons altogether? After all, if there are no Mexican lemons on the streets then they can’t kill any more Americans.

What utter tosh! I don’t care if there is a correlation, there is nothing to suggest that lemons cause accidents. If there was, don’t you think that lemons would be causing accidents on Mexican roads before the trucks crossed into the US? What about Sicilian lemons? Do they cause road deaths in Italy and across Europe?

Oh, the power of correlations. As long as your audience doesn’t understand that correlation is not causation you can make them believe pretty much anything.

Lee Baker is an award-winning software creator that lives behind a keyboard in a darkened room. Illuminated only by the light from his monitor, he aspires to finding the light switch.

With decades of experience in science, statistics and artificial intelligence, he has a passion for telling stories with data. Despite explaining it a dozen times, his mother still doesn’t understand what he does for a living.

Insisting that data analysis is much simpler than we think it is, he authors friendly, easy-to-understand blogs and books that teach the fundamentals of data analysis and statistics.

His mission is to unleash your inner data ninja!

As the CEO of Chi-Squared Innovations, one day he’d like to retire to do something simpler, like crocodile wrestling.

PS – Don’t forget to connect with me in Twitter: @eelrekab

## 5 Free Data Science Books for the New Year

5 Free Data Science Books for the New Year

Now that Christmas and the New Year are behind us the nights are becoming a little longer with each passing day. Nevertheless, there’s still loads of cold winter nights left to endure (unless you’re in the Southern Hemisphere, in which case – throw me a shrimp on the barbie!).

It’s time to dust off your New Year resolutions from last year (remember those?) and get ready for a new start, a new you and learn some new data skills.

I’ve thrown together a collection of five excellent (and free!) Data Science eBooks for your Kindle to sharpen up your ninja skills while you’re on the long commute to work. Just try not to read them while driving!

I hope that you find something in here that will get your mental juices flowing with ideas about how to tackle your data.

All these books are free, so dive in and enjoy!

## 1. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

by Hadley Wickham and Garrett Grolemund

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.

You’ll learn how to:

• Wrangle – transform your datasets into a form convenient for analysis
• Program – learn powerful R tools for solving data problems with greater clarity and ease
• Explore – examine your data, generate hypotheses, and quickly test them
• Model – provide a low-dimensional summary that captures true “signals” in your dataset
• Communicate – learn R Markdown for integrating prose, code, and results

## 2. D3 Tips and Tricks v4.x

by Malcolm MacLean

https://leanpub.com/d3-t-and-t-v4

D3 Tips and Tricks is a book written to help those who may be unfamiliar with JavaScript or web page creation get started turning information into visualization.

Data is the new medium of choice for telling a story or presenting compelling information on the Internet and d3.js is an extraordinary framework for presentation of data on a web page.

This book is not for experts. It’s put together as a guide to get you started if you’re unsure what d3.js can do. It reads more like a story as it leads the reader through the basics of line graphs and on to discover animation, tooltips, tables, interfacing with MySQL databases via PHP, sankey diagrams, force diagrams, maps and more…

## 3. Data Mining And Analysis: Fundamental Concepts and Algorithms

By Mohammed J. Zaki and Wagner Meira, Jr.

The fundamental algorithms in data mining and analysis are the basis for business intelligence and analytics, as well as automated methods to analyze patterns and models for all kinds of data. This textbook for senior undergraduate and graduate data mining courses provides a comprehensive overview from an algorithmic perspective, integrating concepts from machine learning and statistics, with plenty of examples and exercises.

“This book by Mohammed Zaki and Wagner Meira Jr is a great option for teaching a course in data mining or data science. It covers both fundamental and advanced data mining topics, explains the mathematical foundations and the algorithms of data science, includes exercises for each chapter, and provides data, slides and other supplementary material on the companion website.”

Gregory Piatetsky-Shapiro, Founder, ACM SIGKDD, the leading professional organization for Knowledge Discovery and Data Mining.

## 4. Building Machine Learning Systems with Python

by Willi Richert and Luis Pedro Coehlo

https://www.packtpub.com/packt/free-ebook/python-machine-learning-algorithms/

As the Big Data explosion continues at an almost incomprehensible rate, being able to understand and process it becomes even more challenging. With Building Machine Learning Systems with Python, you’ll learn everything you need to tackle the modern data deluge – by harnessing the unique capabilities of Python and its extensive range of numerical and scientific libraries, you will be able to create complex algorithms that can ‘learn’ from data, allowing you to uncover patterns, make predictions, and gain a more in-depth understanding of your data.

Featuring a wealth of real-world examples, this book provides gives you with an accessible route into Python machine learning. Learn the Iris dataset, find out how to build complex classifiers, and get to grips with clustering through practical examples that deliver complex ideas with clarity. Dig deeper into machine learning, and discover guidance on classification and regression, with practical machine learning projects outlining effective strategies for sentiment analysis and basket analysis. The book also takes you through the latest in computer vision, demonstrating how image processing can be used for pattern recognition, as well as showing you how to get a clearer picture of your data and trends by using dimensionality reduction.

Keep up to speed with one of the most exciting trends to emerge from the world of data science and dig deeper into your data with Python with this unique data science tutorial.

## 5. Graphs Don’t Lie

by Lee Baker

https://chi2innovations.lpages.co/book-graphs-don-t-lie/

Did you know that between them, Sarah Palin, Mike Huckabee and Mitt Romney enjoyed a total of 193% support from Republican candidates in the 2012 US primaries? It must be true – it was on a pie chart broadcast on Fox News. Did you also know that the number 34 is smaller than 14, and zero is much bigger than 22? Honest, it’s true, it was published in a respectable national newspaper after the 2017 UK General Election. There can’t have been any kind of misdirection here because they were all shown on a pie chart.

In this astonishing book, award winning statistician and author Lee Baker uncovers how politicians, the press, corporations and other statistical conmen use graphs and charts to deceive their unwitting audience. Like how a shocking, and yet seemingly innocuous statement as “Every year since 1950, the number of children gunned down has doubled”, meant that there should have been at least 35 trillion gun deaths in 1995 alone, the year the quote was printed in a reputable journal. Or how an anti-abortion group made their point by trying to convince us all that 327,000 is actually a larger number than 935,573. Nice try, but no cigar – we weren’t born yesterday.

In his trademark sardonic style, the author reveals the secrets of how the statistical hustlers use graphs and charts to manipulate and misrepresent for political or commercial gain – and often get away with it.

Written as a layman’s guide to lying, cheating and deceiving with graphs, there’s not a dull page in sight!

And it’s got elephants in it too…

## Summary

So there you have it – 5 free Data Science eBooks to get your back-to-work-after-the-holidays head back on and into the swing of things.

I hope you enjoy them, and it would be great if you would leave brief reviews of these books in the comments below – I’m sure all the authors would appreciate your comments and shares.

Lee Baker is an award-winning software creator that lives behind a keyboard in a darkened room. Illuminated only by the light from his monitor, he aspires to finding the light switch.

With decades of experience in science, statistics and artificial intelligence, he has a passion for telling stories with data. Despite explaining it a dozen times, his mother still doesn’t understand what he does for a living.

Insisting that data analysis is much simpler than we think it is, he authors friendly, easy-to-understand blogs and books that teach the fundamentals of data analysis and statistics.

His mission is to unleash your inner data ninja!

As the CEO of Chi-Squared Innovations, one day he’d like to retire to do something simpler, like crocodile wrestling.

PS – Don’t forget to connect with me in Twitter: @eelrekab

Other DSC Articles by the same Author