Rare stickers

In an earlier post we looked at the number of packets of Panini stickers you’d be likely to need in order to complete an album. In that post I made the throwaway and obvious remark that if some stickers were rarer than others, you’d be likely to need even more packs than the standard calculations suggest. But is there any evidence that some stickers are rarer than others?

The official Panini response is given here. In reply to the question “Does Panini deliberately print limited edition or rare stickers?”, the company answer is:

No. Every collection consists of a number of stickers printed on one or two print sheets, which in turn contain one of each sticker and are printed in the same quantity as the albums.

So that seems clear enough. But the experience of collectors is rather different, with many suggesting that some stickers are much harder to find than others. So, what’s the data and what type of statistical methods can be used to judge the evidence?

The first thing to say is that, as we saw in the previous post, the nature of collecting this way can be a frustrating experience. So that even if the average number of packs needed was less than 1000, we saw that some collectors are likely to need more than 2000, even assuming all stickers are equally frequent. To put this into perspective, suppose you’ve already collected 662 of the 682 stickers, and need just another 20 stickers to complete the album. By the same type of calculation that we made before, the expected number of further packs needed to complete the album is 491, which is well over half the expected total number of packs needed (which was 969). You might like to try that calculation yourself, using the same method as in the previous post.

In other words, once you reach the point of needing just 20 more stickers, you’re not even half-way through your collection task in terms of the number of packs you’re likely to need. This can make it seem like certain stickers – the 20 you are missing – are rarer than others, even if they’re not.

But that’s not to say that rare stickers don’t exist – it’s just that the fact you’re like to have to wait so long to get the remaining few stickers might make you feel like they are rare. So again: do rare stickers really exist?

Well, here’s a site that tried to conduct the experiment by buying 1000 packs from the 2018 world cup sticker series, listing every single sticker they got and the number of times they got it. Since each pack contains 5 stickers, this means that they collected a total of 5000 stickers. They then used the proportion of times they got each sticker as an estimate of the probability of getting each sticker. For example, they were fortunate enough to get the Belgium team photo sticker 18 times, so they estimated the probability of getting such a sticker as

18/5000 \approx 1/278

Using this method, they were able to calculate which team collectively were the easiest and most difficult to complete. The results are summarised in the following figure:

On this basis, stickers of players from Senegal and Columbia were easy to obtain, while those of players from Belgium and Saudi Arabia were much harder. So although the Belgium team photo was one of the most frequent stickers in their collection, the individual players from Belgium were among the least frequent.

Now, you won’t need me to tell you that this is pretty much a waste of time. With just 5000 stickers collected, it’s bound to be that some stickers occur more often than others, and we don’t learn anything this way about whether stickers of certain teams or players are genuinely more difficult to find than others. One could try to ascertain whether the pattern of results here is consistent with an equal distribution of stickers, but there’s no mention of such an analysis being done, and a sample of just 5000 stickers would probably be too small to reach definitive conclusions anyway.

One fun thing though: with the 5000 stickers collected, these guys managed to complete the album except for one missing sticker: Radja Nainggolan of Belgium. ¬†But he ended up not being selected for the World Cup anyway ūüėÄ.

This doesn’t really bring us any closer to the question of whether rare stickers exist or not. One interesting suggestion¬†is to look at frequencies of requests in sites that handle markets for second-hand stickers. And indeed, this site finds jumps in frequencies of requests for certain types of stickers, suggesting such types are rarer than others.¬†However, it’s not necessarily the case that the rare stickers are the ones with the most requests: Lionel Messi might be a popular trade, because… Messi… and not because his sticker is rare. Still, the post is a fun read, with complete details about how you might approach this type of analysis.

Finally, far be it for me to promote the exploits of one of our competitors, but some of the guys at ATASS pooled their world cup collections to address exactly this issue. A complete description of their findings can be found here. In summary, from an analysis of nearly 11,000 stickers, they found evidence that shiny stickers – of which there were 50 in the 2018 World Cup series – are much rarer than standard stickers.

Moreover, the strength of the evidence is completely overwhelming. This is partly because the number of stickers collected is large – 11,000 rather than 5,000 in the study I mentioned above – but also because there are 50 shiny stickers, and all of them occurred with a lower frequency than an average sticker. In fact, overall, shiny stickers occurred at a rate that’s around half that of a normal sticker. Now, if a single sticker isn’t found, it’s reasonable to put that down to chance; but it’s beyond the realms of plausibility that 50 stickers of a certain kind all occurred at below the average rate.

On this basis, Panini’s claim that all stickers are equally likely is completely implausible. The only way it could really hold true, assuming the ATASS analysis to be correct, is if there were variations in the physical distribution of stickers, either geographically or temporally. So, although Panini might produce all stickers in equal numbers, variations in distribution might mean that some stickers were harder to get at certain times in certain places.

That seems unlikely though, and the evidence does seem to point to the fact that shiny stickers in the World Cup 2018 series were indeed harder to find than the others.

In summary: yes, rare stickers exist.

Here’s to all the money at the end of the world

I made the point in last week’s Valentine’s Day post, that although the emphasis of this blog is about the methodology of using Statistics to understand the world through the analysis of data, it’s often the case that statistics in themselves tell their own story. In this way we learnt that a good proportion of the population of the UK buy their pets presents for Valentine’s Day.

As if that wasn’t bad enough, I now have to report to you the statistical evidence for the fact that nature itself is dying. Or as the Guardian puts it:

Plummeting insect numbers `threaten collapse of nature’

The statistical and scientific evidence now points to the fact that, at current rates of decline, all insects could be extinct by the end of the century. Admittedly, it’s probably not great science or statistics to extrapolate the current annual loss of 2.5% in that way, but nevertheless it gives you a picture of the way things are going. This projected elimination of insects would be, by some definitions, the sixth mass extinction event on earth. (Earlier versions wiped out dinosaurs and so on).

And before you go all Donald Trump, and say ‘bring it on: mosquito-free holidays’, you need to remember that life on earth is a complex ecological system in which the big things (including humans) are indirectly dependent on the little things (including insects) via complex bio-mechanisms for mutual survival. So if all the insects go, all the humans go too. And this is by the end of the century, remember.

Here’s First Dog on the Moon’s take on it:

So, yeah, let’s do our best to make money for our clients. But let’s also not forget that money only has value if we have a world to spend it in, and use Statistics and all other means at our disposal to fight for the survival of our planet and all the species that live on it.

Happy Valentine’s Day

Happy Valentine’s Day. In case you didn’t get any cards or gifts today, please know that Smartodds loves Statistics loves you.

Anyway, I thought it might be interesting to research some statistics about Valentine’s day, and found this article, from which I learned much more about the population of Britain than I was expecting to.

Here are some of the highlights:

  1. A significant number of people spend money for Valentine’s day on their pets. This number varies per generation, and is as high as 8.7% for millennials.
  2. A slightly smaller, but still significant, ¬†number of people spend money on themselves for Valentine’s. Again, this trend is most prevalent among millennials, and also more common for women than men.
  3. 36.2% of people get unwanted gifts most years.
  4. 19% of people plan to celebrate Valentine’s late in order to save money by buying cards and gifts once the prices have dropped.

I’m not sure which of these statistics I found to be the more shocking.

Most of the posts in this blog are about the way Statistics as a science can be used to investigate problems and interpret data. But sometimes, the statistics are fascinating in themselves, and don’t require any kind of mathematical sophistication to reveal the secrets that they contain.

Anyway, I have to run now to buy myself my girlfriend a gift

Happy Valentine’s…

The bean machine

Take a look at the following video…

It shows the operation of a mechanical device that is variously known as a bean machine, a quincunx or a Galton board. When the machine is flipped, a large number of small balls or beans fall through a funnel at the top of the device. Below the funnel is a layered grid of pegs. As each bean hits a peg it can fall left or right Рwith equal probability if the board is carefully made Рdown to the next layer, where it hits another peg and can again go left or right. This repeats for a number of layers, and the beans are then collected in groups, according to the position they fall in the final layer. At the end you get a kind of physical histogram, where the height of the column of beans corresponds to the frequency with which the beans have fallen in that slot.

Remarkably, every time this experiment is repeated, the pattern of beans at the bottom is pretty much the same: it’s symmetric, high in the middle, low at the edges and has a kind of general bell-shape. In fact, the shape of this histogram will be a good approximation to the well-known normal distribution curve:

As you probably know, it turns out that the relative frequencies of many naturally occurring phenomena look exactly like this normal curve: heights of plants, people’s IQ, brightness of stars…. and indeed (with some slight imperfections) the differences in team points in sports like basketball.

Anyway, if you look at the bottom of the bean machine at the end of the video, you’ll see that the heights of the columns of beans – which in itself represents the frequency of beans falling in each position – resembles this same bell-shaped curve. And this will happen – with different small irregularities – every time the bean machine is re-started.

Obviously, just replaying the video will always lead to identical results, so you’ll have to take my word for it that the results are similar every time the machine is operated. There are some simulators available, but my feeling is you lose something by not seeing the actual physics of real-world beans falling into place. Take a look here if you’re interested, though I suggest you crank the size and speed buttons up to their maximum values first.

But why should it be that the bean machine, like many naturally occurring phenomena, leads to frequencies that closely match the normal curve?

Well, the final position of each bean is the result of several random steps in which the bean could go left or right. If we count +1 every time the bean goes right and -1 every time the bean goes left, then the final position is the sum of these random +/-1 outcomes. And it turns out, that under fairly general conditions, that whenever you have a process that is the sum of several random experiments, the final distribution is bound to look like this bell-shaped normal curve.

This is a remarkable phenomenon. The trajectory of any individual bean is unpredictable. It could go way to the left, or way to the right, though it’s more likely that it will stay fairly central. Anything is possible, though some outcomes are more likely than others. However, while the trajectory of individual beans is unpredictable, the collective behaviour of several thousand beans is entirely predictable to a very high degree of accuracy: the frequencies within any individual range will match very closely the values predicted by the normal distribution curve. This is really what makes statistics tick. We can predict very well how a population will behave, even if we can’t predict how individuals will behave.

Even more remarkably, if the bean machine has enough layers of pegs, the eventual physical histogram of beans will still look like the normal distribution curve, even if the machine has some sort of bias. For example, suppose the beans were released, but that the machine wasn’t quite vertical, so that the beans had a higher tendency to go left, rather than right, when they hit a peg. In this case, as long as there were sufficiently many layers of pegs, the final spread of beans would still resemble the normal curve, albeit no longer centred at the middle of the board. You can try this in the simulator¬†by moving the left/right button away from 50%.

Technically, the bean machine is a physical illustration of a mathematical result generally termed the Central Limit Theorem. This states that in situations like those illustrated by the bean machine, where a phenomenon can be regarded as a sum of random experiments, then under general conditions the distribution of final results will look very much like the well-known bell-shaped normal curve.

It’s difficult to overstate the importance of this result – which is fundamental to almost all areas of statistical theory and practice – since it lets us handle probabilities in populations, even when we don’t know how individuals behave. And the beauty of the bean machine is that it demonstrates that the Central Limit Theorem is meaningful in the real physical world, and not just a mathematical artefact.


Can’t live without your own desktop bean machine? I have good news for you…

 

Groundhog day

Fed up of the cold, snow and rain? Don’t worry, spring is forecast to be here earlier than usual. Two caveats though:

  1. ‘Here’ is some unspecified region of the United States, and might not extend as far as the UK;
  2. This prediction was made by a rodent.

Yes, Saturday (February 2nd) was Groundhog Day in the US. And since Punxsutawney Phil failed to see his shadow, spring is forecast to arrive early.

You probably know about Groundhog Day from the Bill Murray movie

… but it’s actually a real event. It’s celebrated in many locations of the US and Canada, though it’s the event in Punxsutawney, Pennsylvania, which has become the most famous, and around which the movie was based. As Wikipedia says:

The Groundhog Day ceremony held at Punxsutawney in western Pennsylvania, centering around a semi-mythical groundhog named Punxsutawney Phil, has become the most attended.

Semi-mythical, no less. If you’d like to know more about Punxsutawney Phil, there’s plenty of information at The¬†Punxsutawney Groundhog Club website, including a dataset of his predictions. These include the entry from 1937 when Phil had an ‘unfortunate meeting with a skunk’. (And whoever said data analysis was boring?)

Anyway, the theory is that if, at 7.30 a.m. on the second of February, Phil the groundhog sees his shadow, there will be six more weeks of winter; if not, spring will arrive early. Now, it seems a little unlikely that a groundhog will have powers of meteorological prediction, but since the legend has persisted, and there is other evidence of animal behaviour serving as a weather predictor,  it seems reasonable to assess the evidence.

Disappointingly, Phil’s success rate is rather low. This article gives it as 39%. I’m not sure if it’s obvious or not, but the article also states (correctly) that if you were to guess randomly, by tossing a coin, say, then your expected chance of guessing correctly is 50%. The reason I say it might not be obvious, is because the chance of spring arriving early is unlikely to be 50%. It might be 40%, say. Yet, randomly guessing with a coin will still have a 50% expected success rate. As such, Phil is doing worse than someone who guesses at random, or via coin tossing.

However, if Phil’s 39% success rate is a genuine measure of his predictive powers – rather than a reflection of the fact that his guesses are also random, and he’s just been a bit unlucky over the years – then he’s still a very useful companion for predictive purposes. You just need to take his predictions, and predict the opposite. That way you’ll have a 61% success rate – rather better than random guessing. Unfortunately, this means you will have to put up with another 6 weeks of winter.

Meantime, if you simply want more Groundhog Day statistics, you can fill your boots here.

And finally, if you think I’m wasting my time on this stuff, check out the Washington Post who have done a geo-spatial analysis of the whole of the United States to colour-map the regions in which Phil has been respectively more and less successful with his predictions over the years.


ūü§£

Groundhog day

Fed up of the cold, snow and rain? Don’t worry, spring is forecast to be here earlier than usual. Two caveats though:

  1. ‘Here’ is some unspecified region of the United States, and might not extend as far as the UK;
  2. This prediction was made by a rodent.

Yes, Saturday (February 2nd) was Groundhog Day in the US. And since Punxsutawney Phil failed to see his shadow, spring is forecast to arrive early.

You probably know about Groundhog Day from the Bill Murray movie

… but it’s actually a real event. It’s celebrated in many locations of the US and Canada, though it’s the event in Punxsutawney, Pennsylvania, which has become the most famous, and around which the movie was based. As Wikipedia says:

The Groundhog Day ceremony held at Punxsutawney in western Pennsylvania, centering around a semi-mythical groundhog named Punxsutawney Phil, has become the most attended.

Semi-mythical, no less. If you’d like to know more about Punxsutawney Phil, there’s plenty of information at The¬†Punxsutawney Groundhog Club website, including a dataset of his predictions. These include the entry from 1937 when Phil had an ‘unfortunate meeting with a skunk’. (And whoever said data analysis was boring?)

Anyway, the theory is that if, at 7.30 a.m. on the second of February, Phil the groundhog sees his shadow, there will be six more weeks of winter; if not, spring will arrive early. Now, it seems a little unlikely that a groundhog will have powers of meteorological prediction, but since the legend has persisted, and there is other evidence of animal behaviour serving as a weather predictor,  it seems reasonable to assess the evidence.

Disappointingly, Phil’s success rate is rather low. This article gives it as 39%. I’m not sure if it’s obvious or not, but the article also states (correctly) that if you were to guess randomly, by tossing a coin, say, then your expected chance of guessing correctly is 50%. The reason I say it might not be obvious, is because the chance of spring arriving early is unlikely to be 50%. It might be 40%, say. Yet, randomly guessing with a coin will still have a 50% expected success rate. As such, Phil is doing worse than someone who guesses at random, or via coin tossing.

However, if Phil’s 39% success rate is a genuine measure of his predictive powers – rather than a reflection of the fact that his guesses are also random, and he’s just been a bit unlucky over the years – then he’s still a very useful companion for predictive purposes. You just need to take his predictions, and predict the opposite. That way you’ll have a 61% success rate – rather better than random guessing. Unfortunately, this means you will have to put up with another 6 weeks of winter.

Meantime, if you simply want more Groundhog Day statistics, you can fill your boots here.

And finally, if you think I’m wasting my time on this stuff, check out the Washington Post who have done a geo-spatial analysis of the whole of the United States to colour-map the regions in which Phil has been respectively more and less successful with his predictions over the years.

Stickers

 

panini

Last year’s Fifa¬© world cup Panini sticker album had spaces for 682 stickers. Stickers were sold in packs of 5, at a cost of 80 pence per pack. How much was it likely to cost to fill the whole album? Maybe have a guess at this before moving on.

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|


Well, to get 682 stickers you need 137 packs, so the obvious (but wrong) answer is 137 times 80 pence, which is ÔŅ°109.60. It’s wrong, of course, because it doesn’t take into account duplicate stickers: as the album fills up, when you buy a new pack, it’s likely that at least some of the new stickers will be stickers that you’ve already collected. And the more stickers you’ve already collected, the more likely it is that a new pack will contain stickers that you’ve already got. So, you’re likely to need many more than 137 packs and spend much more than¬†ÔŅ°109.60. But how much more?

It turns out (see below) that on average the number of packs needed can be calculated as

(682/682 + 682/681 + 682/680 + \ldots + 682/1) /5 \approx 969

where the “…” means “plus all the terms in-between”. So the next term in the sequence you have to add is 682/679 and then 682/678 and so on, all the way down to the final term in the sequence which is given as 682/1.

So the average cost of filling the album  is around 969 \times 80 pence, or £775. You can probably also guess how this calculation changes if the number of spaces in the album were different from 682 or if the number of stickers per pack were different from 5.

Well, actually, there’s a small mistake in this calculation. Strictly speaking, when you buy packs of 5 stickers, none of the stickers in a pack will be duplicates among themselves. The above calculation ignores this fact, and assumes that duplicates could occur within packs. However, it turns out that doing the mathematics more carefully¬†– which is quite a bit more complicated – leads to a not-very-different answer of ¬£773. So, we might have simplified things in our calculation of ¬£775, but we didn’t lose much in terms of accuracy.

Anyway, a question that’s just as interesting as the accuracy of the answer is what the value of ¬£775 means in practice. Though it’s the average value that would be spent by many collectors in filling the album, the actual experience of any individual collector might be quite different from this. The mathematics is more complicated again in this case, but we can avoid the complexity by simulating the process. The figure below shows a histogram of the number of packs needed to fill the album in a simulation of 10,000 albums.

geo

So, for example,  I needed roughly 800 packs to complete the album in around 1500 of the simulated albums. Of course, the average number of packs needed turns out to be close to the theoretical average of 969. But although sometimes fewer than this number were needed, the asymmetry of the histogram means that on many occasions far more than the average number was needed. For example, on a significant number of  occasions more than 1000 packs were needed; on several occasions more than 1500 packs were needed; and on a few occasions more than 2000 packs were needed (at a cost of over £1600!). By contrast, there were no occasions on which 500 packs were sufficient to complete the album. So, even though an average spend of £775 probably sounded like a lot of money to fill the album, any individual collector might need to spend as much as £2000 or more, while all collectors would have need to spend at least £400.

This illustrates an important point about Statistics in general Рan average is exactly that: an average. And individual experiences might differ considerably from that average value. Moreover, asymmetry in the underlying probability distribution Рas seen in the histogram above Рwill imply that variations from the average are likely to be bigger in one direction than the other. In the case of Panini sticker albums, you might end up paying a lot more than the average of £775, but are unlikely to spend very much less.


To be fair to Panini, it’s common for collectors to swap duplicate stickers with those of other collectors. This obviously has the effect of reducing the number of packs needed to complete the album. Furthermore, Panini now provide an option for collectors to order up to 50 specific stickers, enabling collectors who have nearly finished the album to do so without buying further packs when the chance of duplication is at its highest. So for both these reasons, the expected costs of completing the album as calculated above are over-estimates. On the other hand, if certain stickers are made deliberately rarer than others, the expected number of packs will increase! Would Panini do that? We’ll discuss that in a future post.


Meantime, for maths enthusiasts, and just in case you’re interested, let’s see where the formula

(682/682 + 682/681 + 682/680 + \ldots + 682/1) /5 \approx 969

comes from. You might remember from an earlier post, that if I repeat an experiment that has probability p of success until I get my first success, ¬†I will have to repeat the experiment an average of 1/p times. Well, buying new stickers until I get one that’s different from those I’ve already collected is an experiment of exactly this type, so I can use this result. But as the number of stickers I’ve already collected changes, so does the probability of obtaining a different sticker.

  • At the start, I have 0 stickers, so the probability the next sticker will be a new sticker is 682/682, and the expected number of stickers¬†I’ll need till the next new sticker is 682/682. (No surprises there.)
  • I will then have 1 sticker, and¬†the probability the next sticker will be a new sticker is 681/682. ¬†So the expected number of stickers¬†I’ll need till the next new sticker is 682/681.
  • I will then have 2 different stickers, and¬†the probability the next sticker will be a new sticker is 680/682. ¬†So the expected number of stickers¬†I’ll need till the next new sticker is 682/680.
  • This goes on and on till I have 681 stickers¬†and¬†the probability the next sticker will be a new sticker is 1/682. ¬†So the expected number of stickers¬†I’ll need till the next new sticker is 682/1.

At that point I’ll have a complete collection. Adding together all these expected numbers of stickers gives

(682/682 + 682/681 + 682/680 + \ldots + 682/1)

But each pack contains 5 stickers, so the expected number of packs I’ll need ¬†is

(682/682 + 682/681 + 682/680 + \ldots + 682/1) /5 \approx 969

Christmas quiz answers

Just in case anyone attempted the Royal Statistical Society Christmas quiz that I included in an earlier post, the solutions are now available here. I managed to match my personal all-time record of zero. Funny though, looking at the solutions, how obvious everything should have been. Bit like looking at last weekend’s results and spotting all the obvious bets!

Who wants to win £194,375?

In an earlier post I included a link to Oscar predictions by film critic Mark Kermode over the years, which included 100% success rate across all of the main categories in a couple of years. I also recounted his story of how he failed to make a fortune in 1992 by not knowing about accumulator bets.

Well, it’s almost Oscar season, and fabien.mauroy@smartodds.co.uk pointed me to this article, which includes Mark’s personal shortlist for the coming awards. Now, these aren’t the same as predictions: in some year’s, Mark has listed his own personal favourites as well as what he believes to be the likely winners, and there’s often very little in common. On the other hand, these lists have been produced prior to the nominations, so you’re likely to get better prices on bets now, rather than later. You’ll have to be quick though, as the nominations are announced in a couple of hours.

Anyway, maybe you’d like to sift through Mark’s recommendations, look for hints as to who he thinks the winner is likely to be, and make a bet accordingly. But if you do make a bet based on these lists, here are a few things to take into account:

  1. Please remember the difference between an accumulator bet and single bets;
  2. Please gamble responsibly;
  3. Please don’t blame me if you lose.

If Mark subsequently publishes actual predictions for the Oscars, I’ll include a link to those as well.


Update: the nominations have now been announced and are listed here. Comparing the nominations with Mark Kermode’s own list, the number of nominations which appear in Mark’s personal list for each category are as follows:

Best Picture: 1

Best Director: 2

Best Actor: 1

Best Actress: 2

Best supporting Actor: 3

Best supporting Actress: 1

Best Score: 2

In each case except Best Picture, there are 5 nominations and Mark’s list also comprised 5 contenders. For Best Picture, there are 8 nominations, though Mark only provided 5 suggestions.

So, not much overlap. But again, these weren’t intended to be Mark’s predictions. They were his own choices. I’ll aim to update with Mark’s actual predictions if he publishes them.

Pulp Fiction (Our Esteemed Leader’s cut)

The previous post had a cinematic theme. That got me remembering an offsite a while back where Matthew.Benham@smartbapps.co.uk gave a talk that I think he called ‘Do the Right Thing’,¬†which is the title of a 1989 Spike Lee film. Midway through his talk Matthew gave a premiere screening of his own version of a scene from Pulp Fiction. Unfortunately, I’ve been unable to get hold of a copy of Matthew’s cut, so we’ll just have to make do with the inferior original….

The theme of Matthew’s talk was the importance of always acting in relation to best knowledge, even if it contradicts previous actions taken when different information was available. So, given the knowledge and information you had at the start of a game, you might have bet on team A. But if the game evolves in such a way that a bet on team B becomes positive value, you should do that. Always do the right thing. And the point of the scene from Pulp Fiction? Don’t let pride get in the way of that principle. ¬†

These issues will make a great topic for this blog sometime. But this post is about something else…

Dependence is a big issue in Statistics, and we’re likely to return to it in different ways in future posts. Loosely speaking, two events are said to be independent if knowing the outcome of one, doesn’t affect the probabilities of the outcomes of the other. For example, it’s usually reasonable to treat the outcomes of two different football matches taking place on the same day as independent. If we know one match finished 3-0, that information is unlikely to affect any judgements we might have about the possible outcomes of a later match. Events that are not independent are said to be dependent: in this case, knowing the outcome of one will affect the outcome of the other. ¬†In tennis matches, for example, the outcome of one set tends to affect the chances of who will win a subsequent set, so set winners are dependent events.¬†

With this in mind, let’s follow-up the discussion in the previous 2 posts (here and here) about accumulator bets. By multiplying prices from separate bets together, bookmakers are assuming that the events are independent. But if there were dependence between the events, it’s possible that an accumulator offers a value bet, even if the individual bets are of negative value. This might be part of the reason why Mark Kermode has been successful in several accumulator bets over the years (or would have been if he’d taken his predictions to the bookmaker and actually placed an accumulator bet).

Let me illustrate this with some entirely made-up numbers. Let’s suppose ‘Pulp Fiction (Our Esteemed Leader’s cut)’, is up for a best movie award, and its upstart director, Matthew Benham, has also been nominated for best director. The numbers for single bets on PF and MB are given in the following table. We’ll suppose the bookmakers are accurate in their evaluation of the probabilities, and that they guarantee themselves an expected profit by offering prices that are below the fair prices (see the earlier post).¬†

  True Probability Fair Price Bookmaker Price
Best Movie: PF 0.4 2.5 2
Best Director: MB 0.25 4 3.5

 

Because the available prices are lower than the fair prices and the probabilities are correct, both individual bets have negative value (-0.2 and -0.125 respectively for a unit stake). The overall price for a PF/MB accumulator bet is 7, which assuming independence is an even poorer value bet, since the expected winnings from a unit stake are

0.4 \times 0.25 \times 7 -1 = -0.3

However, suppose voters for the awards tend to have similar preferences across categories, so that if they like a particular movie, there’s an increased chance they’ll also like the director of that movie. In that case, although the table above might be correct, the probability of MB winning the director award if PF (MB cut) is the movie winner is likely to be greater than 0.25. For argument’s sake, let’s suppose it’s 0.5. Then, the expected winnings from a unit stake accumulator bet become

0.4 \times 0.5 \times 7 -1 = 0.4

That’s to say, although the individual bets are still both negative value, the accumulator bet is extremely good value. This situation arises because of the implicit assumption of independence in the calculation of accumulator prices. When the assumption is wrong, the true expected winnings will be different from those implied by the bookmaker prices, potentially generating a positive value bet.

Obviously with most accumulator bets – like multiple football results – independence is more realistic, and this discussion is unhelpful. But for speciality bets like the Oscars, or perhaps some political bets where late swings in votes are likely to affect more that one region, there may be considerable value in accumulator bets if available.


If anyone has a copy of Our Esteemed Leader’s cut of the Pulp Fiction scene on a pen-drive somewhere, and would kindly pass it to me, I will happily update this post to include it.¬†