At The Intersection

You’ll remember Venn diagrams from school. They’re essentially a mathematical tool for laying out the information in partially overlapping sets. And in statistics they are often used in the same way for showing the possible outcomes in events which might overlap.

For example, here’s a Venn diagram showing the relationship between whales and fish:

Whales and fish have some properties that are unique, but they also have some features in common. These are all shown in the appropriate parts of the diagram, with the common elements falling in the part of the sets that overlap – the so-called intersection.

With this in mind, I recently came across the following Venn poem titled ‘At the Intersection’ written by Brian Bilston:

You can probably work it out. There are three poems in total:  separate ones for ‘him’ and ‘her’ and their intersection. Life seen from two different perspectives, the result of which is contained in the intersection.



Not so clever

You remember that thing about well-produced statistical diagrams telling their own story without the need for additional words?

Well, the same thing goes for badly produced statistical diagrams:

Thanks to for giving me this idea for a post.

Weapons of math destruction

I haven’t read it, but Cathy O’Neil’s ‘Weapons of Math Destruction‘  is a great title for a book. Here’s what one reviewer wrote:

Cathy O’Neil an experienced data scientist and mathematics professor illustrates the pitfalls of allowing data scientists to operate in a moral and ethical vacuum including how the poor and disadvantaged are targeted for payday loans, high cost insurance and political messaging on the basis of their zipcodes and other harvested data.

So, WOMD shows how the data-based algorithms that increasingly form the fabric of our lives – from Google to Facebook to banks to shopping to politics – and the statistical methodology behind them are actually pushing societies in the direction of greater inequality and reduced democracy.

At the time of writing WOMD these arguments were still in their infancy; but now we are starting to live the repercussions of the success of the campaign to remove Britain from the EU – which was largely driven by a highly professional exercise in Data Science – they seem much more relevant and urgent.

Anyway, Cathy O’Neil herself recently gave an interview to Bloomberg. Unfortunately, you now have to subscribe to read the whole article, so you won’t see much if you follow the link. But it was an interesting interview for various reasons. In particular, she discussed the trigger which led her to a love of data and mathematics. She wrote that when she was nine her father showed her a mathematics puzzle. And solving that problem led Cathy to a lifelong appreciation of the power of mathematical thinking. She wrote..

… I’ve never felt more empowered by anything since.

It’s more of a mathematical than a statistical puzzle, but maybe you’d like to think about it for yourself anyway…

Consider this diagram:

It’s a chessboard with 2 of the corner squares removed. Now, suppose you had a set of 31 dominoes, with each domino being able to cover 2 adjacent horizontal or vertical squares. Your aim is to find a way of covering the 62 squares of the mutilated board with the 31 dominoes. If you’d like to try it, mail me with either a diagram or photo of your solution; or, if you think it can’t be done, mail me an explanation. I’ll discuss the solution in a future post.



Fringe benefits

The Edinburgh Fringe Festival is the largest arts festival in the world. The 2019 version has just finished, but Wikipedia lists some of the statistics for the 2018 edition:

  1. the festival lasted 25 days;
  2. it included more than 55,0000 performances;
  3. that comprised 3548 different shows.

The shows themselves are of many different types, including theatre, dance, circus and music. But the largest section of the festival is comedy, and performers compete for the Edinburgh Comedy Awards – formerly known as the Perrier Award – which is given to the best comedy show on the fringe.

I mention all this because the TV Channel Dave also publishes what it regards to be the best 10 jokes of the festival. And number 4 this year was a statistical joke.


A cowboy asked me if I could help him round up 18 cows. I said, “Yes, of course. That’s 20 cows.”

Confession: the joke is really based on arithmetic rather than Statistics.

Woodland creatures

The hedgehog and the fox is an essay by philosopher Isaiah Berlin. Though published in 1993, the title is a reference to a fragment of a poem by the ancient Greek poet Archilochus. The relevant passage translates as:

… a fox knows many things, but a hedgehog one important thing.

Isaiah Berlin used this concept to classify famous thinkers: those whose ideas could be summarised by a single principle are hedgehogs; those whose ideas are more pragmatic, multi-faceted and evolving are foxes.

This dichotomy of approaches to thinking has more recently been applied in the context of prediction, and is the basis of the following short (less than 5-minute) video, kindly suggested to me by

Watch and enjoy…

So, remarkably, in a study of the accuracy of individuals when making predictions, nothing made a difference: age, sex, political outlook… Except, ‘foxes’ are better predictors than ‘hedgehogs’: being well-versed in a single consistent philosophy is inferior to an adaptive and evolving approach to knowledge and its application.

The narrator, David Spiegelhalter, also summarises the strengths of a good forecaster as:

  1. Aggregation. They use multiple sources of information, are open to new knowledge and are happy to work in teams.
  2. Metacognition. They have an insight into how they think and the biases they might have, such as seeking evidence that simply confirms pre-set ideas.
  3. Humility. They have a willingness to acknowledge uncertainty, admit errors and change their minds. Rather than saying categorically what is going to happen, they are only prepared to give probabilities of future events.

(Could almost be a bible for a sports modelling company.)

These principles are taken from the book Future Babble by Dan Gardner, which looks like it’s a great read. The tagline for the book is ‘how to stop worrying and love the unpredictable’, which on its own is worth the cost of the book.

Incidentally, I could just have easily written a blog entry with David Spiegelhalter as part of my series of famous statisticians. Until recently he was the president of the Royal Statistical Society. He was also knighted in 2014 for his services to Statistics, and has numerous awards and honorary degrees.

His contributions to statistics are many, especially in the field of Medical Statistics.  Equally though, as you can tell from the above video, he is a fantastic communicator of statistical ideas. He also has a recent book out: The art of statistics: learning from data. I’d guess that if anyone wants to learn something about Statistics from a single book, this would be the place to go. I’ve just bought it, but haven’t read it yet. Once I do, if it seems appropriate, I’ll post a review to the blog.

Statty night

Apologies for the terrible pun in the title.

When I used to teach Statistics I tried to emphasise to students that Statistics is as much an art as a science. Statisticians are generally trying to make sense of some aspect of the world, and they usually have just some noisy data with which to try to do it. Sure, there are algorithms and computer packages they can chuck data into and get simple answers out of. But usually those answers are meaningless unless the algorithm/package is properly tailored to the needs of the specific problem. And there are no rules as to how that is best done: it needs a good understanding of the problem itself, an awareness of the data that are available and the creative skill to be able to mesh those things with appropriate statistical tools. And these are skills that are closer to the mindset of an artist than of a scientist.

But anyway… I recently came across the following picture which turns the tables, and uses Statistics to make art. (Or to destroy art, depending on your point of view). You probably recognise the picture at the head of this post as Van Gogh’s Starry Night, which is displayed at MOMA in New York.

By contrast, the picture below is a statistical reinterpretation of the original version of Starry Night, created by photographer Mario Klingemann through a combination of data visualisation and statistical summarisation techniques .

The Starry Night Pie Packed

As you can see, the original painting has been replaced by a collage of coloured circles, which are roughly the same colour as the original painting. But in closer detail, the circles have an interesting structure. Each is actually a pie chart whose slices in size and colour correspond the proportions of colours in that region of the original picture.

Yes, pointless, but kind of fun nonetheless. You can find more examples of Klingemann’s statistically distorted classical artworks here.

In similar vein… the diagram below, produced by artist Arthur Buxton, is actually a quiz. Each of the pie charts represents the proportions of the main colours in one of Van Gogh’s paintings. In other words, these pie charts represent the colour distributions over a whole Van Gogh painting, rather than just a small region of a picture, as in the painting above. The quiz is to identify which Van Gogh painting each of the pie charts refers to.

You can find a short description of Arthur Buxton’s process in developing this picture here.

There’s just a small snag: I haven’t been able to locate the answers. My guess is that the pie chart in column 2 of row 2 corresponds to Starry Night. And the one immediately to the left of that is from the Sunflower series. But that’s pretty much exhausted my knowledge of the works of Van Gogh. Let me know if you can identify any of the others and I’ll add them to a list below.

On the basis of experience with jigsaw puzzles – hey, we’re all on a learning curve and you never know when acquired knowledge will be useful – reliably informs me that the third pie chart from the left on the bottom row will correspond to one of the paintings from Van Gogh’s series of Irises. Looking at this link which Nity gave me it seems entirely plausible.

Dance, dance, dance…

Ever thought: ‘I’m pretty sure I would fully understand Statistics, if only a modern dance company would illustrate the techniques for me’?

I hope you get the idea of what I’m trying to do with this blog by now. Fundamentally, Statistics is a very intuitive subject, but that intuition is often masked by technicalities, so that from the outside the subject can seem both boring and impenetrable. The aim of all of my posts is to try to show that neither of those things is true: Statistics is both fascinating and easily understandable. And in this way, whatever your connection to Smartodds, you’ll be better equipped to understand the statistical side of the company’s operations.

Of course, I’m not the only person to try to de-mystify Statistics, and there are many books, blogs and other learning aids with similar aims.

With this in mind, I recently came across a rather unusual set of resources for learning Statistics: a series of dance videos whose aim is to explain statistical concepts through movement. Probably my ‘favourite’ is this one, which deals with the notions of sampling and standard error. You might like to take a look…

I think it fair to say that the comments on these videos on YouTube are mixed. One person wrote:

This way it makes complicated things look simpler. Very informative and useful. Loved it. 🙂

While another said:

this makes simple things look complicated but thanks anyway

So, I guess it depends on your perspective. I think I’m on the side of the latter commenter though: I’m pretty sure that in 5 minutes I could give a much clearer and more entertaining explanation of the issues this film is trying to address than the film does itself. But maybe that’s not the point. Perhaps the point is that different things hook different people in, and while personally I can’t think of a much more complicated way of thinking about issues of sampling and measuring accuracy, the dance perspective seems to work for some people.

Anyway, if you think this might be the key to help you unlock some of the mysteries of Statistics, you can find the full series of four videos here, covering topics like correlation and standard deviation. Enjoy.


Statistics by pictures

Generally speaking there are three main phases to any statistical analysis:

  1. Design;
  2. Execution;
  3. Presentation.

Graphical techniques play an important part in both the second and third phases, but the emphasis is different in each. In the second phase the aim is usually exploratory, using graphical representations of data summaries to hunt for structure and relationships that might subsequently be exploited in a formal statistical model. The graphs here tend to be quick but rough, and are intended more for the statistician than the client.

In the presentation phase the emphasis is a bit different, since the analysis has already been completed, usually involving some sort of statistical model and inference. In this case diagrams are used to highlight the results to clients or a wider audience in a way that illustrates most effectively the salient features of the analysis. Very often the strength of message from a statistical analysis is much more striking when presented graphically rather than in the form of numbers. Moreover, some statisticians have also developed the procedure into something of an art form, using graphical techniques not just to convey the results of the analysis, but also to put them back in the context from where the data derive.

One of my favourite exponents of this technique is Mona Chalabi, who has regular columns in the Guardian. among other places.

Here are a few of her examples:

Most Popular Dog Names in New York


A Complete History of the Legislation of Same-Sex Marriage 


The Most Pirated Christmas Movies


And last and almost certainly least…



Tell you what though… that looks a bit more than 16% to me, suggesting a rather excessive use of artistic licence in this particular case.