You looking at me?

Statistics: helping you solve life’s more difficult problems…

You might have read recently – since it was in every news outlet here, here, here, here, here, here, and here for example – that recent research has shown that staring at seagulls inhibits them from stealing your food. This article even shows a couple of videos of how the experiment was conducted. The researcher placed a package of food some metres in front of her in the vicinity of a seagull. In one experiment she watched the bird and timed how long it took before it snatched the food. She then repeated the experiment, with the same seagull, but this time facing away from the seagull. Finally, she repeated this exercise with a number of different seagulls in different locations.

At the heart of the study is a statistical analysis, and there are several points about both the analysis itself and the way it was reported that are interesting from a wider statistical perspective:

  1. The experiment is a good example of a designed paired experiment. Some seagulls are more likely to take food than others regardless of whether they are being looked at or not. The experiment aims to control for this effect by using pairs of results from each seagull: one in which the seagull was stared at, the other where it was not. By using knowledge that the data are in pairs this way, the accuracy of the analysis is improved considerably. This makes it much more likely to identify a possible effect within the noisy data.
  2. To avoid the possibility that, for example, a seagull is more likely to take food quickly the second time, the order in which the pairs of experiments are applied is randomised for each seagull.
  3. Other factors are also controlled for in the analysis: the presence of other birds, the distance of the food, the presence of other people and so on.
  4. The original experiment involved 74 birds, but many were uncooperative and refused the food in one or other of the experiments. In the end the analysis is based on just 19 birds who took food both when being stared at and not. So even though results prove to be significant, it’s worth remembering that the sample on which results were based is very small.
  5. It used to be very difficult to verify the accuracy of a published statistical analysis. These days it’s almost standard for data and code to be published alongside the manuscript itself. This enables readers to both check the results and carry out their own alternative analyses. For this paper, which you can find in full here, the data and code are available here.
  6. If you look at the code it’s just a few lines from R. It’s notable that such a sophisticated analysis can be carried out with such simple code.
  7. At the risk of being pedantic, although most newspapers went with headlines like ‘Staring at seagulls is best way to stop them stealing your chips‘, that’s not really an accurate summary of the research at all. Clearly, a much better way to stop seagulls eating your food is not to eat in the vicinity of seagulls. (Doh!) But even aside from this nit-picking point, the research didn’t show that staring at seagulls stopped them ‘stealing your chips’. It showed that, on average, the seagulls that bother to steal your chips, do so more quickly when you are looking away. In other words, the headline should be:

If you insist on eating chips in the vicinity of seagulls, you’ll lose them quicker if you’re not looking at them

Guess that’s why I’m a statistician and not a journalist.

The issue of designed statistical experiments was something I also discussed in an earlier post. As I mentioned then, it’s an aspect of Statistics that, so far, hasn’t much been exploited in the context of sports modelling, where analyses tend to be based on historically collected data. But in the context of gambling, where different strategies for betting might be compared and contrasted, it’s likely to be a powerful approach. In that case, the issues of controlling for other variables – like the identity of the gambler or the stake size – and randomising to avoid biases will be equally important.

 

Off script

off script

So, how did your team get on in the first round of Premier League fixtures for the 2019-20 season? My team, Sheffield United, were back in the top flight after a 13-year absence. It didn’t go too well though. Here’s the report:

EFL goal machine Billy Sharp’s long wait for a top-flight strike ends on the opening day. Ravel Morrison with the assist. But Bournemouth run out 4-1 winners.

And as if that’s not bad enough, we finished the season in bottom place:

script

Disappointing, but maybe not unexpected.

Arsenal also had a classic Arsenal season. Here’s the story of their run-in:

It seems only the Europa League can save them. They draw Man United. Arsenal abandon all hope and crash out 3-2. Just as they feared. Fans are more sad than angry. Once again they rally. Aubameyang and Alexandre Lacazette lead a demolition of high flying Liverpool. But they drop too many points and end up trophyless with another fifth-place finish.

Oh, Arsenal!

But what is this stuff? The Premier League doesn’t kick off for another week, yet here we have complete details of the entire season, match-by-match, right up to the final league table.

Welcome to The Script, produced by BT Sport. As they themselves explain:

Big data takes on the beautiful game.

And in slightly more detail…

BT has brought together the biggest brains in sports data, analysis and machine learning to write the world’s first artificial intelligence-driven script for a future premier league season.

Essentially, BT Sport have devised a model for match outcomes based on measures of team abilities in attack and defence. So far, so standard. After which…

We then simulate the random events that could occur during a season – such as injuries and player transfers – to give us even more accurate predictions.

But this is novel. How do you assign probabilities to player injuries or transfers? Are all players equally susceptible to injury? Do the terms of a player’s contract affect their chances of being sold? And who they are sold too? And what is the effect on a team’s performance of losing a player?

So, this level of modelling is difficult. But let’s just suppose for a minute you can do it. You have a model for what players will be available for a team in any of their fixtures. And you then have a model that, given the 2 sets of players that are available to teams for any fixture, spits out the probabilities of the various possible scores. Provided the model’s not too complicated, you can probably first simulate the respective lineups in a match, and then the scores given the team lineups. And that’s why Sheffield United lost 4-1 on the opening day to Bournemouth. And that’s why Arsenal did an Arsenal at the end of the season. And that’s why the league table ended up like it did above.

But is this a useful resource for predicting the Premier League?

Have a think about this before scrolling down. Imagine you’re a gambler, looking to bet on the outcome of the Premier League season. Perhaps betting on who the champions will be, or the top three, or who will be relegated, or whether Arsenal will finish fifth. Assuming BT’s model is reasonable, would you find the Script that they’ve provided helpful in deciding what bets to make?

|
|
|
|
|
|
|
|
|
|
|
|
|
|

Personally, I think the answer is ‘no’, not very helpful. What BT seem to have done is run A SINGLE SIMULATION of their model, for every game over the entire season, accumulating the simulated points of each team per match to calculate their final league position.

A SINGLE SIMULATION!

Imagine having a dice that you suspected of being biased, and you tried to understand its properties with a single roll. It’s almost pointless. Admittedly, with the Script, each team has 38 simulated matches, so the final league table is likely to be more representative of genuine team ability than the outcome of a single throw of a dice. But still, it’s the simulation of just a single season.

What would be much more useful would be to simulate many seasons and count, for example, in how many of those seasons Sheffield United were relegated. This way the model would be providing an estimate of the probability that Sheffield United gets relegated, and we could compare that against market prices to see if it’s a worthwhile bet.

In summary, we’ve seen in earlier posts (here and here, for example) contenders for the most pointless simulation in a sporting context, but the Script is lowering the bar to unforeseen levels. Despite this, if the blog is still going at the end of the season, I’ll do an assessment of how accurate the Script’s estimates turned out to be.

 

Word rank

I recently came across a large database of the use of English-American words. It aims to provide a representative sample of the usage English-American by including the words extracted from a large number of English texts of different types – books, newspaper articles, magazines etc. In total it includes around 560 million words collected over the years 1990-2017.

The word ‘football’ occurs in the database 25,271 times and has rank 1543. In principle, this means that ‘football’ was the 1543rd most frequent word in the database, though the method used for ranking the database elements is a little more complicated than that, since it attempts to combine a measure of both the number of times the word appears and the number of texts it appears in. Let’s leave that subtlety aside though and assume that ‘football’, with a frequency of 25,271, is the 1543rd most common word in the database.

The word ‘baseball’ occurs in the same database 28,851 times. With just this information, what would you predict the rank of the word ‘baseball’ to be? For example, if you think ‘baseball’ is the most common word, it would have rank 1. (It isn’t: ‘the’ is the most common word). If you think ‘baseball’ would be the 1000th most common word, your answer would be 1000.

Give it a little thought, but don’t waste time on it. I really just want to use the problem as an introduction to an issue that I’ll discuss in a future post. I’d be happy to receive your answer though, together with an explanation if you like, by mail. Or if you’d just like to fire an answer anonymously at me, without explanation, you can do so using this survey form.

 

First pick

Zion Williamson

If you follow basketball you’re likely to know that the NBA draft was held this weekend, resulting in wonderkid Zion Williamson being selected for New Orleans Pelicans. The draft system is a procedure by which newly available players are distributed among the various NBA teams.

Unlike most team sports at professional level in Europe, the draft system is a partial attempt to balance out teams in terms of the quality of their players. Specifically, teams that do worse one season are given preference when choosing players for the next season. It’s a slightly archaic and complicated procedure – which is shorthand for saying I couldn’t understand all the details from Wikipedia – but the principles are simple enough.

There are 3 stages to the procedure:

  1. A draft lottery schedule, in which teams are given a probability of having first pick, second pick and so on, based on their league position in the previous season. Only teams below a certain level in the league are permitted to have the first pick,  and the probabilities allocated to each team are inversely related to their league position. In particular, the lowest placed teams have the highest probability of getting first pick.
  2. The draft lottery itself, held towards the end of May, where the order of pick selections are assigned randomly to the teams according to the probabilities assigned in the schedule.
  3. The draft selection, held in June, where teams make their picks in the order that they’ve been allocated in the lottery procedure.

In the 2019 draft lottery, the first pick probabilities were assigned as follows:

nbapick

So, the lowest-placed teams, New York, Cleveland and Phoenix, were all given a 14% chance, down to Charlotte, Miami and Sacramento who were given a 1% chance. The stars and other indicators in the table are an additional complication arising from the fact that teams can trade their place in the draw from one season to another.

In the event, following the lottery based on these probabilities, the first three picks were given to New Orleans, Memphis and New York respectively. The final stage in the process was then carried out this weekend, resulting in the anticipated selection of Zion Williamson by the New Orleans Pelicans.

There are several interesting aspects to this whole process from a statistical point of view.

The first concerns the physical aspects of the draft lottery. Here’s an extract from the NBA’s own description of the procedure:

Fourteen ping-pong balls numbered 1 through 14 will be placed in a lottery machine. There are 1,001 possible combinations when four balls are drawn out of 14, without regard to their order of selection. Before the lottery, 1,000 of those 1,001 combinations will be assigned to the 14 participating lottery teams. The lottery machine is manufactured by the Smart Play Company, a leading manufacturer of state lottery machines throughout the United States. Smart Play also weighs, measures and certifies the ping-pong balls before the drawing.

The drawing process occurs in the following manner: All 14 balls are placed in the lottery machine and they are mixed for 20 seconds, and then the first ball is removed. The remaining balls are mixed in the lottery machine for another 10 seconds, and then the second ball is drawn. There is a 10-second mix, and then the third ball is drawn. There is a 10-second mix, and then the fourth ball is drawn. The team that has been assigned that combination will receive the No. 1 pick. The same process is repeated with the same ping-pong balls and lottery machine for the second through fourth picks.

If the same team comes up more than once, the result is discarded and another four-ball combination is selected. Also, if the one unassigned combination is drawn, the result is discarded and the balls are drawn again. The length of time the balls are mixed is monitored by a timekeeper who faces away from the machine and signals the machine operator after the appropriate amount of time has elapsed.

You probably don’t need me to explain how complicated this all is, compared to the two lines of code it would take to instruct the same procedure electronically. Arguably, perhaps, seeing the lottery carried out with the physical presence of ping pong balls might stop people thinking the results had been fixed. Except it doesn’t. So, it’s all just for show. Why do things efficiently and electronically when you can add razzmatazz and generate high tv ratings? Watching a statistician generate the same ratings for a couple of minutes on a laptop maybe just wouldn’t have the same appeal.

Anyway, my real reason for including this topic in the blog is the following. In several previous posts I’ve mentioned the use of simulation as a statistical technique. Applications are varied, but in most cases simulation is used to generate many realisations from a probability model in order to get a picture of what real data are likely to look like if their random characteristics are somehow linked to that probability model. 

For example, in this post I simulated how many packs of Panini stickers would be needed to fill an album. Calculating the probabilities of the number of packs needed to complete an album is difficult, but the simulation of the process of completing an album is easy.

And in a couple of recent posts (here and here) we used simulation techniques to verify what seemed like an easy intuitive result. As it turned out, the simulated results were different from what the theory suggested, and a slightly deeper study of the problem showed that some care was needed in the way the data wee simulated. But nonetheless, the principle of using simulations to investigate the expected outcomes of a random experiment were sound. In each case simulations were used to generate data from a process whose probabilities would have been practically impossible to calculate by other means.

Which brings me to this article, sent to me by Oliver.Cobb@smartodds.co.uk. On the day of the draft lottery, the masterminds at USA Today decided to run 100 simulations of the draft lottery to see which team would get the first pick. It’s mind-numbingly pointless. As Ollie brilliantly put it:

You have to admire the way they’ve based an article on taking a known chance of something happening and using just 100 simulations to generate a less reliable figure than the one they started with.

In case you’re interested, and can’t be bothered with the article, Chicago got selected for first pick most often – 19 times – in the 100 USA Today simulations, and were therefore ‘predicted’ to win the lottery.  But if they’d run their simulations much more often, it’s 100% guaranteed that Chicago wouldn’t have won, but would have been allocated first pick close to the 12.5% of occasions corresponding to their probability in the table above. With enough simulations, the simulated game would always be won by one of New York, Cleveland or Phoenix, whose proportions would only be separated by small amounts due to random variation.

The only positive thing you can say about the USA Today article, is that at least they had the good sense not to do the simulation with 14 actual ping pong balls. As they say themselves:

So to celebrate one of the most cruel and unusual days in sports, we ran tankathon.com’s NBA draft lottery simulator 100 times to predict how tonight will play out. There’s no science behind this. We literally hit “sim lottery” 100 times and wrote down the results.

I especially like the “there’s no science behind this” bit.  Meantime, if you want to create your own approximation to a known set of probabilities, you too can hit the “sim lottery” button 100 times here.


Update: Benoit.Jottreau@Smartodds.co.uk pointed me at this article, which is relevant for two reasons. First, in terms of content. In previous versions of the lottery system, there was a stronger incentive in terms of probability assignments for teams to do badly in the league. This led to teams ‘tanking’: deliberately throwing games towards the end of a season when they knew they were unlikely to reach the playoffs, thereby improving their chances of getting a better player in the draft for the following season. The 2019 version of the lottery aims to reduce this effect, by giving teams less of an incentive to be particularly poor. For example, the lowest three teams in the league now share the highest probability of first pick in the draft, whereas previously the lowest team had a higher probability than all others. But the article Benoit sent me suggests that the changes are unlikely to have much of an impact. It concludes:

…it seems that teams that want to tank still have strong incentives to tank, even if the restructured NBA draft lottery makes it less likely for them to receive the best picks.

The other reason why this article is relevant is that it makes much more intelligent use of simulation as a technique than the USA Today article referred to above.

Nul Points

No doubt you’re already well-aware of, and eagerly anticipating, this year’s Eurovision song contest final to be held in Tel Aviv between the 14th and 18th May. But just in case you don’t know, the Eurovision song contest is an annual competition to choose the ‘best’ song entered between the various participating European countries. And Australia!

Quite possibly the world would never have heard of Abba if they hadn’t won Eurovision. Nor Conchita Wurst.

The voting rules have changed over the years, but the structure has remained pretty much the same. Judges from each participating country rank their favourite 10  songs – excluding that of their own country, which they cannot vote for – and points are awarded on the basis of preference. In the current scheme, the first choice gets 12 points, the second choice 10 points, the third choice 8 points, then down to the tenth choice which gets a single point.

A country’s total score is the sum awarded by each of the other countries, and the country with the highest score wins the competition. In most years the scoring system has made it possible for a song to receive zero points – nul points – as a total, and there’s a kind of anti-roll-of-honour dedicated to countries that have accomplished this feat. Special congratulations to Austria and Norway who, despite their deep contemporary musical roots, have each scored nul points on four occasions.

Anyway, here’s the thing. Although the UK gave the world The Beatles, The Rolling Stones, Pink Floyd, Led Zeppelin, David Bowie, Joy Division and Radiohead. And Adele. It hasn’t done very well in recent years in the Eurovision Song Contest.  It’s true that by 1997 the UK had won the competition a respectable 5 times – admittedly with a bit of gratuitous sexism involving the removal of women’s clothing to distract judges from the paucity of the music. But since then, nothing. Indeed, since 2000 the UK has finished in last place on 3 occasions, and has only twice been in the top 10.

Now, there are two possible explanations for this.

  1. Our songs have been terrible. (Well, even more terrible than the others).
  2. There’s a stitch-up in the voting process, with countries penalising England for reasons that have nothing to do with the quality of the songs.

But how can we objectively distinguish between these two possibilities? The poor results for the UK will be the same in either case, so we can’t use the UK’s data alone to unravel things.

Well, one way is to hypothesise a system by which votes are cast that is independent of song quality, and to see if the data support that hypothesis. One such hypothesis is a kind of ‘bloc’ voting system, where countries tend to award higher votes for countries of a similar geographical or political background to their own.

This article carries out an informal statistical analysis of exactly this type. Though the explanations in the article are sketchy, a summary of the results is given in the following figure. Rather than pre-defining the blocs, the authors use the data on voting patterns themselves to identify 3 blocs of countries whose voting patterns are similar. They are colour-coded in the figure, which shows (in some vague, undefined sense) the tendency for countries on the left to favour countries on the right in voting. Broadly speaking there’s a northern Europe group in blue, which includes the UK, an ex-Yugoslavian bloc in green and a rest-of-Europe bloc in red. But whereas the fair-minded north Europeans tend to spread their results every across all countries, the other two blocs tend to give highest votes to other member countries within the same bloc.

But does this mean the votes are based on non-musical criteria? Well, not necessarily. It’s quite likely that cultural differences – including musical ones – are also smaller within geographically homogeneous blocs than across them. In other words, Romania and Moldavia might vote for each other at a much higher than average rate, but this could just as easily be because they have similar musical roots and tastes as because they are friends scratching each other’s backs.

Another study finding similar conclusions about geo-political bloc voting is contained in this Telegraph article, which makes similar findings, but concludes:

Comforting as it might be to blame bloc voting for the UK’s endless poor record, it’s not the only reason we don’t do well.

In other words, in a more detailed analysis which models performance after allowing for bloc-voting effects, England is still doing badly.

This whole issue has also been studied in much greater detail in the academic literature using complex statistical models, and the conclusions are similar, though the authors report language and cultural similarities as being more important than geographical factors.

The techniques used in these various different studies are actually extremely important in other areas of application. In genetic studies, for example, they are used to identify groups of markers for certain disease types. And even in sports modelling they can be relevant for identifying teams or players that have similar styles of play.

But if Eurovision floats your boat, you can carry out your own analysis of the data based on the complete database of results available here.


Update: Thanks to Susie.Bruck@smartodds.co.uk for pointing me to this. So not only did the UK finish last this year, they also had their points score reduced retrospectively. If ever you needed evidence of an anti-UK conspiracy… 😉

More or Less

In a recent post I included a link to an article that showed how Statistics can be used to disseminate bullshit. That article was written by Tim Harford, who describes himself as ‘The Undercover Economist’, which is also the title of his blog. Besides the blog, Tim has written several books, one of which is also called ‘The Undercover Economist‘.

As you can probably guess from all of this, Tim is an economist who, through his writing and broadcasting, aims to bring the issues of economics to as wide an audience as possible. But there’s often a very thin boundary between what’s economics and what’s Statistics, and a lot of Tim’s work can equally be viewed from a statistical perspective.

The reason I mention all this is that Tim is also the presenter of a Radio 4 programme ‘More or Less’, whose aim is to…

…try to make sense of the statistics which surround us.

‘More or Less’ is a weekly half-hour show, which covers 3 or 4 topics each week. You can find a list of, and link to, recent episodes here.

As an example, at the time of writing this post the latest episode includes the following items:

  • An investigation of a claim in a recent research paper that claimed floods had worsened by a factor of 15  since 2005;
  • An investigation into a claim by the Labour Party that a recent resurgence in the number of cases of Victorian diseases is due to government  austerity policy;
  • An interview with Matt Parker, who was referenced in this blog here, about his new book ‘Humble Pi’;
  • An investigation into a claim in The Sunday Times that drinking a bottle of wine per week is equivalent to a losing £2,400 per year in terms of reduction in happiness.

Ok, now, admittedly, the whole tone of the programme is about as ‘Radio 4’ as you could possibly get. But still, as a means for learning more about the way Statistics is used – and more often than not, mis-used – by politicians, salespeople, journalists and so on, it’s a great listen and I highly recommend it.

If Smartodds loves Statistics was a radio show, this is what it would be like (but less posh).

Groundhog day

Fed up of the cold, snow and rain? Don’t worry, spring is forecast to be here earlier than usual. Two caveats though:

  1. ‘Here’ is some unspecified region of the United States, and might not extend as far as the UK;
  2. This prediction was made by a rodent.

Yes, Saturday (February 2nd) was Groundhog Day in the US. And since Punxsutawney Phil failed to see his shadow, spring is forecast to arrive early.

You probably know about Groundhog Day from the Bill Murray movie

… but it’s actually a real event. It’s celebrated in many locations of the US and Canada, though it’s the event in Punxsutawney, Pennsylvania, which has become the most famous, and around which the movie was based. As Wikipedia says:

The Groundhog Day ceremony held at Punxsutawney in western Pennsylvania, centering around a semi-mythical groundhog named Punxsutawney Phil, has become the most attended.

Semi-mythical, no less. If you’d like to know more about Punxsutawney Phil, there’s plenty of information at The Punxsutawney Groundhog Club website, including a dataset of his predictions. These include the entry from 1937 when Phil had an ‘unfortunate meeting with a skunk’. (And whoever said data analysis was boring?)

Anyway, the theory is that if, at 7.30 a.m. on the second of February, Phil the groundhog sees his shadow, there will be six more weeks of winter; if not, spring will arrive early. Now, it seems a little unlikely that a groundhog will have powers of meteorological prediction, but since the legend has persisted, and there is other evidence of animal behaviour serving as a weather predictor,  it seems reasonable to assess the evidence.

Disappointingly, Phil’s success rate is rather low. This article gives it as 39%. I’m not sure if it’s obvious or not, but the article also states (correctly) that if you were to guess randomly, by tossing a coin, say, then your expected chance of guessing correctly is 50%. The reason I say it might not be obvious, is because the chance of spring arriving early is unlikely to be 50%. It might be 40%, say. Yet, randomly guessing with a coin will still have a 50% expected success rate. As such, Phil is doing worse than someone who guesses at random, or via coin tossing.

However, if Phil’s 39% success rate is a genuine measure of his predictive powers – rather than a reflection of the fact that his guesses are also random, and he’s just been a bit unlucky over the years – then he’s still a very useful companion for predictive purposes. You just need to take his predictions, and predict the opposite. That way you’ll have a 61% success rate – rather better than random guessing. Unfortunately, this means you will have to put up with another 6 weeks of winter.

Meantime, if you simply want more Groundhog Day statistics, you can fill your boots here.

And finally, if you think I’m wasting my time on this stuff, check out the Washington Post who have done a geo-spatial analysis of the whole of the United States to colour-map the regions in which Phil has been respectively more and less successful with his predictions over the years.


🤣

Groundhog day

Fed up of the cold, snow and rain? Don’t worry, spring is forecast to be here earlier than usual. Two caveats though:

  1. ‘Here’ is some unspecified region of the United States, and might not extend as far as the UK;
  2. This prediction was made by a rodent.

Yes, Saturday (February 2nd) was Groundhog Day in the US. And since Punxsutawney Phil failed to see his shadow, spring is forecast to arrive early.

You probably know about Groundhog Day from the Bill Murray movie

… but it’s actually a real event. It’s celebrated in many locations of the US and Canada, though it’s the event in Punxsutawney, Pennsylvania, which has become the most famous, and around which the movie was based. As Wikipedia says:

The Groundhog Day ceremony held at Punxsutawney in western Pennsylvania, centering around a semi-mythical groundhog named Punxsutawney Phil, has become the most attended.

Semi-mythical, no less. If you’d like to know more about Punxsutawney Phil, there’s plenty of information at The Punxsutawney Groundhog Club website, including a dataset of his predictions. These include the entry from 1937 when Phil had an ‘unfortunate meeting with a skunk’. (And whoever said data analysis was boring?)

Anyway, the theory is that if, at 7.30 a.m. on the second of February, Phil the groundhog sees his shadow, there will be six more weeks of winter; if not, spring will arrive early. Now, it seems a little unlikely that a groundhog will have powers of meteorological prediction, but since the legend has persisted, and there is other evidence of animal behaviour serving as a weather predictor,  it seems reasonable to assess the evidence.

Disappointingly, Phil’s success rate is rather low. This article gives it as 39%. I’m not sure if it’s obvious or not, but the article also states (correctly) that if you were to guess randomly, by tossing a coin, say, then your expected chance of guessing correctly is 50%. The reason I say it might not be obvious, is because the chance of spring arriving early is unlikely to be 50%. It might be 40%, say. Yet, randomly guessing with a coin will still have a 50% expected success rate. As such, Phil is doing worse than someone who guesses at random, or via coin tossing.

However, if Phil’s 39% success rate is a genuine measure of his predictive powers – rather than a reflection of the fact that his guesses are also random, and he’s just been a bit unlucky over the years – then he’s still a very useful companion for predictive purposes. You just need to take his predictions, and predict the opposite. That way you’ll have a 61% success rate – rather better than random guessing. Unfortunately, this means you will have to put up with another 6 weeks of winter.

Meantime, if you simply want more Groundhog Day statistics, you can fill your boots here.

And finally, if you think I’m wasting my time on this stuff, check out the Washington Post who have done a geo-spatial analysis of the whole of the United States to colour-map the regions in which Phil has been respectively more and less successful with his predictions over the years.

Who wants to win £194,375?

In an earlier post I included a link to Oscar predictions by film critic Mark Kermode over the years, which included 100% success rate across all of the main categories in a couple of years. I also recounted his story of how he failed to make a fortune in 1992 by not knowing about accumulator bets.

Well, it’s almost Oscar season, and fabien.mauroy@smartodds.co.uk pointed me to this article, which includes Mark’s personal shortlist for the coming awards. Now, these aren’t the same as predictions: in some year’s, Mark has listed his own personal favourites as well as what he believes to be the likely winners, and there’s often very little in common. On the other hand, these lists have been produced prior to the nominations, so you’re likely to get better prices on bets now, rather than later. You’ll have to be quick though, as the nominations are announced in a couple of hours.

Anyway, maybe you’d like to sift through Mark’s recommendations, look for hints as to who he thinks the winner is likely to be, and make a bet accordingly. But if you do make a bet based on these lists, here are a few things to take into account:

  1. Please remember the difference between an accumulator bet and single bets;
  2. Please gamble responsibly;
  3. Please don’t blame me if you lose.

If Mark subsequently publishes actual predictions for the Oscars, I’ll include a link to those as well.


Update: the nominations have now been announced and are listed here. Comparing the nominations with Mark Kermode’s own list, the number of nominations which appear in Mark’s personal list for each category are as follows:

Best Picture: 1

Best Director: 2

Best Actor: 1

Best Actress: 2

Best supporting Actor: 3

Best supporting Actress: 1

Best Score: 2

In each case except Best Picture, there are 5 nominations and Mark’s list also comprised 5 contenders. For Best Picture, there are 8 nominations, though Mark only provided 5 suggestions.

So, not much overlap. But again, these weren’t intended to be Mark’s predictions. They were his own choices. I’ll aim to update with Mark’s actual predictions if he publishes them.

Statistics by pictures

Generally speaking there are three main phases to any statistical analysis:

  1. Design;
  2. Execution;
  3. Presentation.

Graphical techniques play an important part in both the second and third phases, but the emphasis is different in each. In the second phase the aim is usually exploratory, using graphical representations of data summaries to hunt for structure and relationships that might subsequently be exploited in a formal statistical model. The graphs here tend to be quick but rough, and are intended more for the statistician than the client.

In the presentation phase the emphasis is a bit different, since the analysis has already been completed, usually involving some sort of statistical model and inference. In this case diagrams are used to highlight the results to clients or a wider audience in a way that illustrates most effectively the salient features of the analysis. Very often the strength of message from a statistical analysis is much more striking when presented graphically rather than in the form of numbers. Moreover, some statisticians have also developed the procedure into something of an art form, using graphical techniques not just to convey the results of the analysis, but also to put them back in the context from where the data derive.

One of my favourite exponents of this technique is Mona Chalabi, who has regular columns in the Guardian. among other places.

Here are a few of her examples:

Most Popular Dog Names in New York

mona_2

A Complete History of the Legislation of Same-Sex Marriage 

mona4

The Most Pirated Christmas Movies

mona_1

And last and almost certainly least…

Untitled

mona5

Tell you what though… that looks a bit more than 16% to me, suggesting a rather excessive use of artistic licence in this particular case.