Nul Points

No doubt you’re already well-aware of, and eagerly anticipating, this year’s Eurovision song contest final to be held in Tel Aviv between the 14th and 18th May. But just in case you don’t know, the Eurovision song contest is an annual competition to choose the ‘best’ song entered between the various participating European countries. And Australia!

Quite possibly the world would never have heard of Abba if they hadn’t won Eurovision. Nor Conchita Wurst.

The voting rules have changed over the years, but the structure has remained pretty much the same. Judges from each participating country rank their favourite 10  songs – excluding that of their own country, which they cannot vote for – and points are awarded on the basis of preference. In the current scheme, the first choice gets 12 points, the second choice 10 points, the third choice 8 points, then down to the tenth choice which gets a single point.

A country’s total score is the sum awarded by each of the other countries, and the country with the highest score wins the competition. In most years the scoring system has made it possible for a song to receive zero points – nul points – as a total, and there’s a kind of anti-roll-of-honour dedicated to countries that have accomplished this feat. Special congratulations to Austria and Norway who, despite their deep contemporary musical roots, have each scored nul points on four occasions.

Anyway, here’s the thing. Although the UK gave the world The Beatles, The Rolling Stones, Pink Floyd, Led Zeppelin, David Bowie, Joy Division and Radiohead. And Adele. It hasn’t done very well in recent years in the Eurovision Song Contest.  It’s true that by 1997 the UK had won the competition a respectable 5 times – admittedly with a bit of gratuitous sexism involving the removal of women’s clothing to distract judges from the paucity of the music. But since then, nothing. Indeed, since 2000 the UK has finished in last place on 3 occasions, and has only twice been in the top 10.

Now, there are two possible explanations for this.

  1. Our songs have been terrible. (Well, even more terrible than the others).
  2. There’s a stitch-up in the voting process, with countries penalising England for reasons that have nothing to do with the quality of the songs.

But how can we objectively distinguish between these two possibilities? The poor results for the UK will be the same in either case, so we can’t use the UK’s data alone to unravel things.

Well, one way is to hypothesise a system by which votes are cast that is independent of song quality, and to see if the data support that hypothesis. One such hypothesis is a kind of ‘bloc’ voting system, where countries tend to award higher votes for countries of a similar geographical or political background to their own.

This article carries out an informal statistical analysis of exactly this type. Though the explanations in the article are sketchy, a summary of the results is given in the following figure. Rather than pre-defining the blocs, the authors use the data on voting patterns themselves to identify 3 blocs of countries whose voting patterns are similar. They are colour-coded in the figure, which shows (in some vague, undefined sense) the tendency for countries on the left to favour countries on the right in voting. Broadly speaking there’s a northern Europe group in blue, which includes the UK, an ex-Yugoslavian bloc in green and a rest-of-Europe bloc in red. But whereas the fair-minded north Europeans tend to spread their results every across all countries, the other two blocs tend to give highest votes to other member countries within the same bloc.

But does this mean the votes are based on non-musical criteria? Well, not necessarily. It’s quite likely that cultural differences – including musical ones – are also smaller within geographically homogeneous blocs than across them. In other words, Romania and Moldavia might vote for each other at a much higher than average rate, but this could just as easily be because they have similar musical roots and tastes as because they are friends scratching each other’s backs.

Another study finding similar conclusions about geo-political bloc voting is contained in this Telegraph article, which makes similar findings, but concludes:

Comforting as it might be to blame bloc voting for the UK’s endless poor record, it’s not the only reason we don’t do well.

In other words, in a more detailed analysis which models performance after allowing for bloc-voting effects, England is still doing badly.

This whole issue has also been studied in much greater detail in the academic literature using complex statistical models, and the conclusions are similar, though the authors report language and cultural similarities as being more important than geographical factors.

The techniques used in these various different studies are actually extremely important in other areas of application. In genetic studies, for example, they are used to identify groups of markers for certain disease types. And even in sports modelling they can be relevant for identifying teams or players that have similar styles of play.

But if Eurovision floats your boat, you can carry out your own analysis of the data based on the complete database of results available here.


Update: Thanks to Susie.Bruck@smartodds.co.uk for pointing me to this. So not only did the UK finish last this year, they also had their points score reduced retrospectively. If ever you needed evidence of an anti-UK conspiracy… 😉

More or Less

In a recent post I included a link to an article that showed how Statistics can be used to disseminate bullshit. That article was written by Tim Harford, who describes himself as ‘The Undercover Economist’, which is also the title of his blog. Besides the blog, Tim has written several books, one of which is also called ‘The Undercover Economist‘.

As you can probably guess from all of this, Tim is an economist who, through his writing and broadcasting, aims to bring the issues of economics to as wide an audience as possible. But there’s often a very thin boundary between what’s economics and what’s Statistics, and a lot of Tim’s work can equally be viewed from a statistical perspective.

The reason I mention all this is that Tim is also the presenter of a Radio 4 programme ‘More or Less’, whose aim is to…

…try to make sense of the statistics which surround us.

‘More or Less’ is a weekly half-hour show, which covers 3 or 4 topics each week. You can find a list of, and link to, recent episodes here.

As an example, at the time of writing this post the latest episode includes the following items:

  • An investigation of a claim in a recent research paper that claimed floods had worsened by a factor of 15  since 2005;
  • An investigation into a claim by the Labour Party that a recent resurgence in the number of cases of Victorian diseases is due to government  austerity policy;
  • An interview with Matt Parker, who was referenced in this blog here, about his new book ‘Humble Pi’;
  • An investigation into a claim in The Sunday Times that drinking a bottle of wine per week is equivalent to a losing £2,400 per year in terms of reduction in happiness.

Ok, now, admittedly, the whole tone of the programme is about as ‘Radio 4’ as you could possibly get. But still, as a means for learning more about the way Statistics is used – and more often than not, mis-used – by politicians, salespeople, journalists and so on, it’s a great listen and I highly recommend it.

If Smartodds loves Statistics was a radio show, this is what it would be like (but less posh).

Groundhog day

Fed up of the cold, snow and rain? Don’t worry, spring is forecast to be here earlier than usual. Two caveats though:

  1. ‘Here’ is some unspecified region of the United States, and might not extend as far as the UK;
  2. This prediction was made by a rodent.

Yes, Saturday (February 2nd) was Groundhog Day in the US. And since Punxsutawney Phil failed to see his shadow, spring is forecast to arrive early.

You probably know about Groundhog Day from the Bill Murray movie

… but it’s actually a real event. It’s celebrated in many locations of the US and Canada, though it’s the event in Punxsutawney, Pennsylvania, which has become the most famous, and around which the movie was based. As Wikipedia says:

The Groundhog Day ceremony held at Punxsutawney in western Pennsylvania, centering around a semi-mythical groundhog named Punxsutawney Phil, has become the most attended.

Semi-mythical, no less. If you’d like to know more about Punxsutawney Phil, there’s plenty of information at The Punxsutawney Groundhog Club website, including a dataset of his predictions. These include the entry from 1937 when Phil had an ‘unfortunate meeting with a skunk’. (And whoever said data analysis was boring?)

Anyway, the theory is that if, at 7.30 a.m. on the second of February, Phil the groundhog sees his shadow, there will be six more weeks of winter; if not, spring will arrive early. Now, it seems a little unlikely that a groundhog will have powers of meteorological prediction, but since the legend has persisted, and there is other evidence of animal behaviour serving as a weather predictor,  it seems reasonable to assess the evidence.

Disappointingly, Phil’s success rate is rather low. This article gives it as 39%. I’m not sure if it’s obvious or not, but the article also states (correctly) that if you were to guess randomly, by tossing a coin, say, then your expected chance of guessing correctly is 50%. The reason I say it might not be obvious, is because the chance of spring arriving early is unlikely to be 50%. It might be 40%, say. Yet, randomly guessing with a coin will still have a 50% expected success rate. As such, Phil is doing worse than someone who guesses at random, or via coin tossing.

However, if Phil’s 39% success rate is a genuine measure of his predictive powers – rather than a reflection of the fact that his guesses are also random, and he’s just been a bit unlucky over the years – then he’s still a very useful companion for predictive purposes. You just need to take his predictions, and predict the opposite. That way you’ll have a 61% success rate – rather better than random guessing. Unfortunately, this means you will have to put up with another 6 weeks of winter.

Meantime, if you simply want more Groundhog Day statistics, you can fill your boots here.

And finally, if you think I’m wasting my time on this stuff, check out the Washington Post who have done a geo-spatial analysis of the whole of the United States to colour-map the regions in which Phil has been respectively more and less successful with his predictions over the years.


🤣

Groundhog day

Fed up of the cold, snow and rain? Don’t worry, spring is forecast to be here earlier than usual. Two caveats though:

  1. ‘Here’ is some unspecified region of the United States, and might not extend as far as the UK;
  2. This prediction was made by a rodent.

Yes, Saturday (February 2nd) was Groundhog Day in the US. And since Punxsutawney Phil failed to see his shadow, spring is forecast to arrive early.

You probably know about Groundhog Day from the Bill Murray movie

… but it’s actually a real event. It’s celebrated in many locations of the US and Canada, though it’s the event in Punxsutawney, Pennsylvania, which has become the most famous, and around which the movie was based. As Wikipedia says:

The Groundhog Day ceremony held at Punxsutawney in western Pennsylvania, centering around a semi-mythical groundhog named Punxsutawney Phil, has become the most attended.

Semi-mythical, no less. If you’d like to know more about Punxsutawney Phil, there’s plenty of information at The Punxsutawney Groundhog Club website, including a dataset of his predictions. These include the entry from 1937 when Phil had an ‘unfortunate meeting with a skunk’. (And whoever said data analysis was boring?)

Anyway, the theory is that if, at 7.30 a.m. on the second of February, Phil the groundhog sees his shadow, there will be six more weeks of winter; if not, spring will arrive early. Now, it seems a little unlikely that a groundhog will have powers of meteorological prediction, but since the legend has persisted, and there is other evidence of animal behaviour serving as a weather predictor,  it seems reasonable to assess the evidence.

Disappointingly, Phil’s success rate is rather low. This article gives it as 39%. I’m not sure if it’s obvious or not, but the article also states (correctly) that if you were to guess randomly, by tossing a coin, say, then your expected chance of guessing correctly is 50%. The reason I say it might not be obvious, is because the chance of spring arriving early is unlikely to be 50%. It might be 40%, say. Yet, randomly guessing with a coin will still have a 50% expected success rate. As such, Phil is doing worse than someone who guesses at random, or via coin tossing.

However, if Phil’s 39% success rate is a genuine measure of his predictive powers – rather than a reflection of the fact that his guesses are also random, and he’s just been a bit unlucky over the years – then he’s still a very useful companion for predictive purposes. You just need to take his predictions, and predict the opposite. That way you’ll have a 61% success rate – rather better than random guessing. Unfortunately, this means you will have to put up with another 6 weeks of winter.

Meantime, if you simply want more Groundhog Day statistics, you can fill your boots here.

And finally, if you think I’m wasting my time on this stuff, check out the Washington Post who have done a geo-spatial analysis of the whole of the United States to colour-map the regions in which Phil has been respectively more and less successful with his predictions over the years.

Who wants to win £194,375?

In an earlier post I included a link to Oscar predictions by film critic Mark Kermode over the years, which included 100% success rate across all of the main categories in a couple of years. I also recounted his story of how he failed to make a fortune in 1992 by not knowing about accumulator bets.

Well, it’s almost Oscar season, and fabien.mauroy@smartodds.co.uk pointed me to this article, which includes Mark’s personal shortlist for the coming awards. Now, these aren’t the same as predictions: in some year’s, Mark has listed his own personal favourites as well as what he believes to be the likely winners, and there’s often very little in common. On the other hand, these lists have been produced prior to the nominations, so you’re likely to get better prices on bets now, rather than later. You’ll have to be quick though, as the nominations are announced in a couple of hours.

Anyway, maybe you’d like to sift through Mark’s recommendations, look for hints as to who he thinks the winner is likely to be, and make a bet accordingly. But if you do make a bet based on these lists, here are a few things to take into account:

  1. Please remember the difference between an accumulator bet and single bets;
  2. Please gamble responsibly;
  3. Please don’t blame me if you lose.

If Mark subsequently publishes actual predictions for the Oscars, I’ll include a link to those as well.


Update: the nominations have now been announced and are listed here. Comparing the nominations with Mark Kermode’s own list, the number of nominations which appear in Mark’s personal list for each category are as follows:

Best Picture: 1

Best Director: 2

Best Actor: 1

Best Actress: 2

Best supporting Actor: 3

Best supporting Actress: 1

Best Score: 2

In each case except Best Picture, there are 5 nominations and Mark’s list also comprised 5 contenders. For Best Picture, there are 8 nominations, though Mark only provided 5 suggestions.

So, not much overlap. But again, these weren’t intended to be Mark’s predictions. They were his own choices. I’ll aim to update with Mark’s actual predictions if he publishes them.

Statistics by pictures

Generally speaking there are three main phases to any statistical analysis:

  1. Design;
  2. Execution;
  3. Presentation.

Graphical techniques play an important part in both the second and third phases, but the emphasis is different in each. In the second phase the aim is usually exploratory, using graphical representations of data summaries to hunt for structure and relationships that might subsequently be exploited in a formal statistical model. The graphs here tend to be quick but rough, and are intended more for the statistician than the client.

In the presentation phase the emphasis is a bit different, since the analysis has already been completed, usually involving some sort of statistical model and inference. In this case diagrams are used to highlight the results to clients or a wider audience in a way that illustrates most effectively the salient features of the analysis. Very often the strength of message from a statistical analysis is much more striking when presented graphically rather than in the form of numbers. Moreover, some statisticians have also developed the procedure into something of an art form, using graphical techniques not just to convey the results of the analysis, but also to put them back in the context from where the data derive.

One of my favourite exponents of this technique is Mona Chalabi, who has regular columns in the Guardian. among other places.

Here are a few of her examples:

Most Popular Dog Names in New York

mona_2

A Complete History of the Legislation of Same-Sex Marriage 

mona4

The Most Pirated Christmas Movies

mona_1

And last and almost certainly least…

Untitled

mona5

Tell you what though… that looks a bit more than 16% to me, suggesting a rather excessive use of artistic licence in this particular case.

How to be wrong

When I’m not feeling too fragile to be able to handle it, I sometimes listen to James O’Brien on LBC. As you probably know, he hosts a talk show in which he invites listeners to discuss their views on a wide range of topics, that often begin and end with Brexit. His usual approach is simply to ask people who call in to defend or support their views with hard facts – as opposed to opinion or hearsay – and inevitably they can’t. James himself is well-armed with facts and knowledge, and is consequently able to forensically dissect arguments that are dressed up as factual, but turn out to be anything but. It’s simultaneously inspiring and incredibly depressing.

He’s also just published a book, which is a great read:

 

This is the description on Amazon:

Every day, James O’Brien listens to people blaming benefits scroungers, the EU, Muslims, feminists and immigrants. But what makes James’s daily LBC show such essential listening – and has made James a standout social media star – is the careful way he punctures their assumptions and dismantles their arguments live on air, every single morning.

In the bestselling How To Be Right, James provides a hilarious and invigorating guide to talking to people with faulty opinions. With chapters on every lightning-rod issue, James shows how people have been fooled into thinking the way they do, and in each case outlines the key questions to ask to reveal fallacies, inconsistencies and double standards.

If you ever get cornered by ardent Brexiteers, Daily Mail disciples or little England patriots, this book is your conversation survival guide.

And this is the Sun review on the cover:

James O’Brien is the epitome of a smug, sanctimonious, condescending, obsessively politically-correct, champagne-socialist public schoolboy Remoaner.

Obviously, both these opinions should give you the encouragement you need to read the book. Admittedly, it’s only tenuously related to Statistics, but the emphasis on the importance of fact and evidence is a common theme.

But I don’t want to talk about being right. I want to talk about being wrong.

One of my first tasks when I joined Smartodds around 14 years ago was to develop an alternative model to the standard goals model for football. I made a fairly simple suggestion, and we coded it up to run live in parallel to the goals model. We kept it going for a year or so, but rather than being an improvement on the goals model, it tended to give poorer results. This was disappointing, so I looked into things and came up with a ‘proof’ of how, in idealised circumstances, it was impossible for the new model to improve on the goals model. Admittedly, our goals model didn’t quite have the idealised form, so it wasn’t a complete surprise that the numbers were a bit different. But the argument seemed to suggest anyway that we shouldn’t really expect any improvement, and since we weren’t getting very good results anyway, we were happy to bury the new model on the strength of this slightly idealised theoretical argument.

Fast-forward 14 years… Some bright sparks in the RnD team have been experimenting with models that have similar structure to the one which I’d proved couldn’t really work and which we’d previously abandoned. And they’ve been getting quite good results, that seem to be an improvement on the performance of the original goals model. At first I thought it might just be that the new models were so different to the one I’d previously suggested, that my arguments about the model not being able to improve on the goals model might not be valid. But when I looked at things more closely, I realised that there was a flaw in my original argument. It wasn’t wrong exactly, but it didn’t apply to the versions of the model we were likely to use in practice.

Of course, this is good and bad news. It’s good news that there’s no reason why the new versions of the model shouldn’t improve on the goals model. It’s bad news that if we’d understood that 14 years ago, we might have explored this avenue of research sooner. I should emphasise, it might be that this type of model still ends up not improving on our original goals model; it’s just that whereas I thought there was a theoretical argument which suggested that was unlikely, this argument actually doesn’t hold true.

So what’s the point of this post?

Well, all of us are wrong sometimes. And in the world of Statistics, we’re probably wrong more often than most people, and sometimes for good reasons. It might be:

  • We were unlucky in the data we used. They suggested something, but it turned out to be just due to chance.
  • Something changed. We correctly spotted something in some data, but subsequent to that things changed, and what we’d previously spotted no longer applies.
  • The data themselves were incomplete or unreliable.

Or it might be for not-such-good reasons:

  • We made a mistake in the modelling.
  • We made a mistake in the programming.

Or, just maybe, someone was careless when applying a simple mathematical identity in a situation for which it wasn’t really appropriate. Anyway, mistakes are inevitable, so here’s a handy guide about how to be wrong:

  1. Try very hard not to be wrong.
  2. Realise that, despite trying very hard, you might be wrong in any situation, so be constantly aware as new evidence becomes available that you may need to modify what you believed to be true.
  3. Once you realise you are wrong, let others know what was wrong and why you made the mistake you did. Humility and honesty is way more useful than evasiveness.
  4. Be aware that other people may be wrong too. Always use other people’s work with an element of caution, and if something seems wrong, politely discuss the possibility with them. (But remember also: you may be wrong about them being wrong).

Hmmm, hope that’s right.


I was encouraged to write a post along these lines by Luigi.Colombo@smartodds.co.uk following a recent  chat where we were discussing the mistake I’d made as explained above. To help me not feel quite so bad about it, he mentioned a recent blog post where some of the research described in Daniel Kahneman’s book, ‘Thinking, Fast and Slow’, is also shown to be unreliable. You might remember I discussed this book briefly in a previous post. Anyway, the essence of that post is that the sample sizes used in much of the reported research are too small for the statistical conclusions reached to be valid. As such, some chapters from Kahneman’s book have to be considered unreliable. Actually, Kahneman himself seems to have been aware of the problem some years ago, writing an open letter to relevant researchers, setting out a possible protocol that would avoid the sorts of problems that occurred in the research on which his book chapters were based. However, while Kahneman himself can’t be blamed for the original failures in the research that he reported on, it’s argued in the blog post that his own earlier research might well have led him to foresee these types of problems. Hence, the rather aggressive tone of his letter seems to me like an attempt at ring-fencing himself from any particular blame for the errors in his book. In other words, this episode seems like a slightly different approach to ‘how to be wrong’ compared with my handy guide above.

I just made up this one

I saw this the other day…

And the same day I saw this…

One of these items is a cartoon character inventing a statistic just to support an argument that he can’t justify by logic or other means.

The other one is Dilbert.

I don’t go for stats

mou

I’ve mentioned in previous posts that an analysis of the detailed statistics from a game can provide a deeper understanding of team performance than just the final result. This point of view is increasingly shared and understood in the football world, but there are some areas of resistance. Here‘s Mourinho after yesterday’s 3-1 defeat of Man United against Man City:

The way people who don’t understand football analyse it with stats. I don’t go for stats. I go for what I felt in the game and it was there until minute 80-something. I consider the performance of my team one with mistakes. It is different from a bad performance.

And here are the stats that he doesn’t go for:

Of course, there’s a fair point to be made: statistics don’t tell the whole story, and it’s always important, wherever possible, to balance the information that they provide with the kind of information you get from an expert watching a game. Equally though, it has to be a missed opportunity not to take any account of the information that is contained in statistics. Or maybe Mourinho is such a total expert that statistics are completely irrelevant compared to his ‘feel for the game’.

Except, oh, wait a minute: ‘Jose Mourinho brings statistics to press conference to silence Marcus Rashford claims‘. Hmmm.

So sad about the leopards

At the recent offsite, Nity.Raj@smartodds.co.uk suggested I do a post on the statistics of climate change. I will do that properly at some point, but there’s such an enormous amount of material to choose from, that I don’t really know where to start or how best to turn it into the “snappy and informative, but fun and light-hearted” type of post that you’ve come to expect from Smartodds loves Statistics.

So, in the meantime, I’ll just drop the following cartoon, made by First Dog on the Moon, who has a regular series in the Guardian. It’s not exactly about climate science, but similar in that it points at humanity’s failures to face up to and confront the effects we are having on our planet, despite the overwhelming statistical and scientific evidence of both the effect and its consequences. It specifically refers to the recent WWF report which confirms, amongst other things, that humanity has wiped out 60% of the world’s animal population since 1970.

Responding to the report, the Guardian quotes Prof Johan Rockström, a global sustainability expert at the Potsdam Institute for Climate Impact Research in Germany, as follows:

We are rapidly running out of time. Only by addressing both ecosystems and climate do we stand a chance of safeguarding a stable planet for humanity’s future on Earth.

Remember, kids: “Listen to the scientists and not the Nazis”.