Bookmakers love accumulators

You probably know about accumulator, or so-called ‘acca’, bets. Rather than betting individually on several different matches, in an accumulator any winnings from a first bet are used as the stake in a second bet.  If either bet loses, you lose, but if both bets win, there’s the potential to make more money than is available from single bets due to the accumulation of the prices. This process can be applied multiple times, with the winnings from several bets carried over as the stake to a subsequent bet, and the total winnings if all bets come in can be substantial. On the downside, it just takes one bet to lose and you win nothing.

Bookmakers love accumulators, and often apply special offers – as you can see in the profile picture above – to encourage gamblers to make such bets. Let’s see why that’s the case.

Consider a tennis match between two equally-matched players. Since the players are equally-matched, it’s reasonable to assume that each has a probability 0.5 of winning. So if a bookmaker was offering fair odds on the winner of this match, he should offer a price of 2 on either player, meaning that if I place a bet of 1 unit I will receive 2 units (including the return of my stake) if I win. This makes the bet fair, in the sense that my expected winnings – the amount I would win on average if the game were repeated  many times – is zero. This is because

$(1/2 \times 2) + (1/2 \times 0) -1 = 0$

That’s the sum of the probabilities multiplied by the prices, take away the stake.

The bet is fair in the sense that, if the match were repeated many times, both the gambler and the bookmaker would expect neither to win nor lose. But bookmakers aren’t in the business of being fair; they’re out to make money and will set lower prices to ensure that they have long-run winnings. So instead of offering a price of 2 on either player, they might offer a price of 1.9. In this case, assuming gamblers split their stakes evenly across two players, bookmakers will expect to win the following proportion of the total stake

$1-1/2\times(1/2 \times 1.9) - 1/2\times (1/2 \times 1.9)=0.05$

In other words, bookmakers have a locked-in 5% expected profit. Of course, they might not get 5%. Suppose most of the money is placed on player A, who happens to win. Then, the bookmaker is likely to lose money. But this is unlikely: if the players are evenly matched, the money placed by different gamblers will probably be evenly spread between the two players. And if it’s not, then the bookmakers can adjust their prices to try to encourage more bets on the less-favoured side.

Now, in an accumulator bet, the prices are multiplied. It’s equivalent to taking all of your winnings from a first bet and placing them on a second bet. Then those winnings are placed on the outcome of a third bet, and so on. So if there are two tennis matches, A versus B and C versus D, each of which is evenly-matched, the fair and actual prices on the accumulator outcomes are as follows:

Accumulator Bet A-C A-D B-C B-D
Fair Price 4 4 4 4
Actual Price 3.61 3.61  3.61 3.61

The value 3.61 comes from taking the prices of the individual bets, 1.9 in each case, and multiplying them together. It follows that the expected profit for the bookmaker is

$1-4\times 1/4\times(1/4 \times 3.61) = 0.0975$.

So, the bookmaker profit is now expected to be almost 10%. In other words, with a single accumulator, bookmakers almost double their expected profits. With further accumulators, the profits increase further and further. With 3 bets it’s over 14%; with 4 bets it’s around 18.5%. Because of this considerable increase in expected profits with accumulator bets, bookmakers can be ‘generous’ in their offers, as the headline graphic to this post suggests. In actual fact, the offers they are making are peanuts compared to the additional profits they make through gamblers making accumulator bets.

However… all of this assumes that the bookmaker sets prices accurately. What happens if the gambler is more accurate in identifying the fair price for a bet than the bookmaker? Suppose, for example, a gambler reckons correctly that the probabilities for players A and C to win are 0.55 rather than 0.5. A single stake bet spread across the 2 matches would then generate an expected profit of

$0.55\times(1/2 \times 1.9) + 0.55\times (1/2 \times 1.9) -1 = 0.045$

On the other hand, the expected profit from an accumulator bet on A-C is

$(0.55\times1.9) \times (0.55\times1.9) -1 = 0.092$

In other words, just as the bookmaker increases his expected profit through accumulator bets when he has an advantage per single bet, so does the gambler. So, bookmakers do indeed love accumulators, but not against smart gamblers.

In the next post we’ll find out how not knowing the difference between accumulator and standard bets cost one famous gambler a small fortune.

Actually, the situation is not quite as favourable for smart gamblers as the above calculation suggests. Suppose that the true probabilities for a win for A and C are 0.7 and 0.4, which still averages at 0.55. This situation would arise, for example, if the gambler was using a model which performed better than he realised for some matches, but worse than he realised for others.

The expected winnings from single bets remain at 0.045. But now, the expected winnings from an accumulator bet are just:

$(0.7\times1.9) \times (0.4\times1.9) -1 = 0.011,$

which is considerably lower. Moreover, with different numbers, the expected winnings from the accumulator bet could be negative, even though the expected winnings from separate bets is positive. (This would happen, for example, if the win probabilities for A and C were 0.8 and 0.3 respectively.)

So unless the smart gambler is genuinely smart on every bet, an accumulator bet may no longer be in his favour.

Lucky, lucky 2019

Welcome back to Smartodds loves Statistics.

Let’s start the new year with a fun statistic:

2019 is a lucky, lucky year.

Why is that? Well, let’s start with prime numbers. You’ll know that a prime number is a whole number that can’t be written as a multiple of other whole numbers. For example 6 is not a prime number since $6 = 3 \times 2$, but 7 is a prime since it can only be factorised as $7 = 7 \times 1$.

One way of generating the prime numbers is as follows.

2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30….

The first number remaining is 2, so remove all multiples of 2 that are bigger than 2:

2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29…..

The second number remaining is 3, so remove all multiples of 3 that are bigger than 3:

2, 3, 5, 7, 11, 13, 17, 19, 23, 25, 29…..

The third number remaining is 5, so remove all multiples of 5 that are bigger than 5:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29…

And keep going this way. The numbers that remain are the prime numbers.

It’s easy to check that

2, 3, 5, 7, 11, 13, 17, 19, 23, 29

comprise all of the prime numbers that are smaller than 30. To get the bigger prime numbers you just have to apply more steps using the same procedure.

Lucky numbers are generated in much the same way. This time we start with the sequence of all positive whole numbers:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,…..

The second number is 2, so we remove every second number from the sequence, leaving

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, …..

The third number remaining is 5, so we remove every fifth number of the sequence

1, 3, 5, 7, 13, 15, 17, 19, …..

The fourth number remaining is 7, so we remove every 7th number.

1, 3, 5, 7, 13, 15, 19, …..

And so on….

The numbers that remain in this procedure are said to be lucky numbers. And proceeding in this way, it’s easy to check that 2019 is a lucky number. But 2019 isn’t just ‘lucky’, it’s ‘lucky, lucky’. Every whole number can be written uniquely as a multiple of prime numbers. In the case of 2019 the unique prime factorisation is:

$2019 = 3 \times 673$

And… both 3 and 673 are also lucky numbers. So 2019 is doubly lucky in the sense that it is both lucky itself and all of its prime factors are lucky. Moreover, 2019 is the only year this century that has this property, so enjoy it while it lasts.

This post is really more about mathematics than statistics, but how about this? If I take a large number, 2020 say, and pick a number at random from 1 to 2020, what’s the probability that it will be a lucky number? One way to do this would be to identify all of the lucky numbers up to 2020. If there are m such numbers, then the probability a randomly selected number will be lucky is m/2020.  But it turns out there’s a good approximation that can be calculated very easily, and it works for any large number, not just 2020.

A classical result from number theory is that the probability that a randomly selected number in the sequence 1,2,…., N is a prime number, for any large value of N, is approximately

$1/\log(N)$

where  log is the logarithmic function. With N=2020, this is equal to 0.13, so there’s roughly a 13% chance that a number from 1 to 2020 is a prime number. But almost incredibly, this same approximation works also for lucky numbers, so there’s also roughly a 13% chance that a number from 1 to 2020 will be a lucky number. Obviously lucky, lucky numbers are much rarer, and I don’t know of any formula that can be used to calculate the probability of such numbers. The fact that there is just one lucky year this century, though, suggests the probability is pretty low.

Christmas quiz

I mentioned in a previous post the Royal Statistical Society (RSS), which is the UK’s foremost organised body of statisticians. In addition to its role in promoting and publishing all-things statistical, it is also famous for one other thing: its annual Christmas quiz, which is widely considered to be one of the toughest quizzes around. It’s been going for 25 years and is famous enough that it gets reported in full in the Guardian.

Though produced by the RSS, the questions have nothing to do with statistics, and not much mathematics either. That said, the questions do require a good general knowledge, logical thinking and a capacity to approach problems laterally; skills that are useful for statisticians. My personal total score for the quiz over the last 5 years or so is zero.

So, the 2018, 25th anniversary, edition of the quiz is now available here. If you like a good challenge you might enjoy having a go at it. Good luck, and remember that you can’t possibly do worse than me. I’ll post a link to the solutions once they are available.

Just to give you some idea of the types of questions you’re likely to face in the quiz, here’s a question from the 2017 edition:

POLYMERISATION
If 5 is IHNTOBBTTAS, and 10 is IDAATINELR, what will 20 be?

Can you get the answer? There’s a clue in the question title. Once you’ve had enough, scroll down for the solution.

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

SOLUTION:

The polymer £5 note introduced in 2016 by the Bank of England features the quote “I have nothing to offer but blood, toil, tears and sweat” (initial letters IHNTOBBTTAS), while the new polymer £10 note features the quote “I declare after all there is no enjoyment like reading!” (initial letters IDAATINELR). The polymer £20, due for release in 2020, will feature the quote “Light is therefore colour” – so the answer is “LITC”.

Obviously I failed to find this solution in much the same way as I failed to find any solution to any of the questions in each of the quizzes for the last five years. I didn’t look at the quizzes in previous years, but you might extrapolate my more recent scores to get a reasonable estimate of what my score would have been if I had.

If you want further practice, you can find the complete 2017 version of the quiz here and the solutions here.

Stuck in jail?

In an earlier post, Get out of jail,  I set the following problem:

If I roll a standard dice until I get a 6, how many rolls of the dice will I need on average?

A summary of the 8 answers I received is given in the figure below:.

So, 3 people got the answer right, perhaps because they know the general theory which leads to this answer. All other respondents under-estimated the answer, perhaps not taking into account that the number of throws needed could be 10, 20 or, in theory, even more.

But maybe I wasn’t fair in the question, since ‘average’ can have different meanings. The usual interpretation is the ‘mean’, and it’s the mean which takes the value 6. But another choice is the median, which for this problem is 4: on roughly 50% of occasions you’d need 4 throws or less to get a 6, and on a similar amount of occasions you’d need 5 throws or more. So, if you interpreted my question as asking for the median, you were right if your answer was 4, and close if your answer was 3. So again, everyone did really well by one interpretation of the problem. (Of course, if your answer was 3 because you divided 6 by 2 you were close, but for the wrong reason).

The reason why the mean and the median are so different in this problem can be seen in the following figure, which shows the probability for every possible number of throws from 1 to 25. Beyond 25, the probabilities are all very close to zero.

If you imagine the lines showing the probabilities as being made of strips of metal lying flat on a piece of paper, then the mean is the line of equilibrium – i.e. if I place my finger vertically at the mean – so, 6 in this case – the figure would balance perfectly. (Or at least it would if I hadn’t truncated the graph at 25 – in reality the vertical lines stretch out to infinity, but with smaller and smaller heights).

In contrast, the median is the vertical line at which the total lengths of the bars are equally balanced on each side. We can’t get this perfectly, because of the jumps in the lengths from one bar to the next, but very roughly the sum of the lengths of the bars up to the fourth one is equal to the sum of the lengths from the fifth one onwards, so the median is 4.

Consequently, the mean and median are points of equilibrium by different physical rules – by weight for the mean and by total length for the median –  and when probability distributions are very asymmetric, as in the figure above, the values can be quite different.

Anyway, I’d intended to ask for the mean number of throws required, and the answer to that is 6. But why?

It’s not a proper proof, but suppose I rolled the dice 6 million times. Because of the symmetry of the dice, you’d expect around 1 million of those throws to come up 6. And those 1 million 6’s will be randomly spread among the 6 million throws. So, on average, the 6’s will be 6 throws apart. In other words: you have to wait an average of 6 throws after rolling a 6 to throw another 6. And by similar reasoning, you’d have to wait an average of 6 throws before getting the first 6.

Obviously, there’s nothing special about the number 6 here. Or about dice. In general, if I repeat an experiment where there are $N$ different possible outcomes, each of which is equally likely, the average number of times I’ll have to repeat the experiment before having a success is $N$. For example, if cereal packs contain a gift, and there are 10 different available gifts, I’ll need an average of 10 cereal packs to get any particular gift that I’m hoping for.

Just for completeness, and it’s entirely optional, here’s a formal mathematical proof.

Let the average number of throws required be $A$.

On the first throw of the dice there are 2 possibilities:

• I get a 6.
• I don’t get a 6.

These possibilities occur with probability 1/6 and 5/6 respectively. In the first case, I’ve achieved the objective of rolling a 6, and so the total number of throws needed was 1. In the second case, I haven’t achieved the objective, and so still have to make $A$ throws on average to get a 6, on top of the throw that I’ve already made. In other words, in this case I will have to make a total of $A+1$ throws on average. So, with probability 1/6 I just need 1 throw, but with probability 5/6 I need an average of $A+1$ throws. But we also know that the average number of throws overall is $A$. So

$A =1/6 \times 1 +(5/6)\times (A+1)$

This gives an equation to solve for $A$, and you can easily check that it works with $A=6$.

One more quick aside based on the responses to the original post: it’s obviously difficult to draw many conclusions from just 8 responses, though I’m grateful to those of you who did respond. Clearly this type of post isn’t generating much interest, and maybe that’s true of the blog as a whole. I’m planning to give the blog a bit of a break over Christmas anyway, but before then I’ll include a post inviting feedback so that I can try to push the blog in a different direction if that’s preferred. Or maybe just wind things up if that seems more appropriate.

Borel

Struggling for ideas for Christmas presents? Stuck with an Amazon voucher from your employer and don’t know what to do with it? No idea how you’re going to get through Christmas with the in-laws? Trying to ‘Gamble Responsibly‘ but can’t quite kick the habit?

You can thank me later, but I have the perfect solution for you:

Borel

This is a new Trivial-Pursuit-style board game, but with a twist. Players are given a question involving dice, coloured balls or some other experimental apparatus, and have to bet on the outcome. There’s not enough time to actually do the probability calculations, so you just have to go with intuition. You can make bets of different sizes and, just like in real life, should make bigger bets when you think the odds are more in your favour.

This is part of the description at Amazon:

The game combines the human mind’s difficulty to deal with probabilistic dilemmas with the strategic thinking of competitive gambling.

And:

It is designed to reward probabilistic reasoning, intuition, strategic thinking and risk-taking!

In other words, it’s just like Smartodds-world, but without models to help you.

Disclaimer: The description and reviews look great, and I’ve ordered a set for myself, but I haven’t played it yet. I’ll try it on my family over Christmas and let you know how we get on. If you want a set for yourself or your loved ones, it’s available on Amazon here.

It’s just not cricket

I’ve written a couple of posts now – here and here – where I’ve mentioned the Duckworth-Lewis method. As I explained in the first of those posts, this is a  statistical approach to the problem of setting runs targets in cricket matches that are interrupted by rain. And at some point, I might write a post discussing a little about the detail of this method.

But you don’t want me to spoil your post-xmas-party hangover with some heavy statistics, right? You’d much prefer it if I spoilt it with some terrible music.

So, ladies and gentleman, I give you…

The Duckworth Lewis method (band)

Really!

If you follow the link to their webpage you will find details of past tours as well as their recent album, Sticky Wickets, which I presume is a cricket-based play on words referring to the Rolling Stones’s Sticky Fingers. The comparison stops there though!

As you might have guessed from both the name of the band and their album title, their songs are mostly – if not all – cricket-related. So if you’re feeling especially brave, and promise not to hold me responsible, try the following.

It’s just not cricket…

Get out of jail

As a rule, I don’t intend to use this blog to cover standard theory in probability or Statistics. But for a number of reasons, which will become clear in future posts, I’d like you to think about the following problem.

In the game of Monopoly, when you’re in jail, one way of getting out is to throw equal numbers – a so-called double – on the two dice. Actually, in standard Monopoly rules, you’re only allowed to try this method 3 times, and are then required to pay your way out, unless you have a ‘Get out of Jail Free’ card. But suppose there weren’t any such limit to the number of throws you could take, and just kept taking further turns until you rolled a double. In that case, how many rolls, on average, would it take for you to roll a double and get out of jail?

It’s very easy to show that the probability of throwing a double on any one throw is 1/6: let the score on the first dice be S; then the two dice will have the same score if the second dice also shows S. And the probability of that, by symmetry of the dice, is 1/6.

So actually, since 1/6 is also the probability of throwing a 6 on a single dice, this problem has an identical solution to the following one:

If I roll a standard dice until I get a 6, how many rolls of the dice will I need on average?

This is a standard problem in probability and statistics, so anyone who’s studied statistics to a reasonable level will automatically know the answer. But if you don’t know the answer, use your intuition to guess what it might be. Either way, please send me your answer via this survey.

I’ll discuss the solution – and your guesses – in a future post. Like I say, I’ll also be making use of the result in a couple of different ways, also in future posts.

I just made up this one

I saw this the other day…

And the same day I saw this…

One of these items is a cartoon character inventing a statistic just to support an argument that he can’t justify by logic or other means.

The other one is Dilbert.

Rather less than 7.8 billion

In a previous post I set a variation of the classic birthday problem:

What’s the least number of people you need in a room for there to be a 50% chance or more that everyone in the room has the same birthday as someone else in the room?

I mentioned that the problem is difficult to solve, but thought it might be interesting to see how good we are collectively at guessing the answer.

The actual value turns out to be 3064.

It’s not for the faint-hearted, but there’s an academic paper which contains a formula to calculate this result, although the formula as written seems to contain a misprint. Moreover, trying to implement the formula in a simplistic way leads to numerical instabilities resulting in both negative probabilities and probabilities greater than several million (!) for some choices of the number of people in the room. However, the corrected version of the formula seems to work over a reasonable range of numbers. (I checked with a simple simulation routine.)

Anyway, using the corrected formula results in the above graph, which shows the probability that everyone shares a birthday for numbers of people between 2000 and 5000. Below 2000 the probability is essentially zero; above 5000 and it’s essentially one. But between 2000 and 5000 the probability behaves as shown in the graph. You can see that to get a probability of at least 0.5 you need just over 3000 people, and actually the smallest number which takes the probability over the 0.5 threshold is 3064. If you guessed anywhere near that value, or indeed anywhere between 2000 and 5000, you did amazingly well.

One interesting thing about this problem is that the graph suggests that the fewer the people there are in the room, the smaller the probability that they all share the same birthday. Certainly for numbers within the range 2000 – 5000 we can see from the graph that’s true. It’s also true well outside of the range 2000 – 5000.

However, there’s one simple case where the probability is easy to calculate. Suppose there are just two people in the room. In this case the probability that everyone in the room shares a birthday is 1/365. To see this, suppose the first person’s birthday is D. Then everyone – i.e. both people – in the room will share a birthday if the second person’s birthday is also D. Under usual assumptions this is simply 1/365. So, although the graph above decreases as the number of people decreases (i.e. moving along the graph from right to left), there must be a point at which it starts to increase again, in order that when there are 2 people the probability goes up as far as 1/365.

As I wrote in the original post, one reason for setting this problem is to see how well we are able collectively to make a judgement on a problem like this, for which the true answer is very difficult to obtain. Your collective results are summarised  in the following figure, with guesses shown as dots and the true answer shown as a red line:

The guesses varied from 184 to 50,000, with most of the guesses towards the lower end of that range. So, to show the values in a reasonable way, I’ve had to use a logarithmic scale for the graph. Each dot on the graph represents somebody’s guess, and I’ve had to jiggle the points a bit where there were two identical or near-identical guesses.

I’d summarise things as follows:

1. If you count the dots you’ll reach a total of 12. So thanks to all 12 of you who replied, and I’m happy to buy each of you a drink at the Christmas dinner.
2. Before you get too impressed by the fact that two people seem to have guessed the right answer, neither of these ‘guesses’ was perfect. One was 3061, the other was 3065. The fact that they are wrong implies that these respondents didn’t develop, or even google, the exact formula. And don’t be too impressed that they were so close to the true answer either: the guesses are so good that they are almost certainly not just guesses. Chances are that both these attempts derive from a simple simulation of the exercise, similar to the one I used myself to check the formula. It’s easy to get very close to the answer this way, but the inherent randomness of simulations means you need a very large number of simulations to get an accurate estimate of the probability. And deciding, for example, whether 3064 people leads to a probability slightly below or slightly above 0.5 is likely to be very time-consuming. (Time consuming for the computer, that is. I’m not very good at programming, but my version took about 5 minutes to code.)
3. Excluding the two cheats two clever people who almost certainly used simulation to solve the problem, most respondents underestimated the number of people needed. Remember, that until you get to around 2000 people, the probability is essentially zero. Only two of the remaining respondents overestimated the number. And the respondent who guessed 5000 was the person with a genuine guess who came closest to the true answer. Indeed, their guess of 5000 just about made it onto the previous graph showing all the probabilities that were greater than 0, but smaller than 1.

Conclusion: this was a very difficult problem to have much intuition about, even though the specification of the problem is very simple. Collectively we tended to underestimate the number of people needed, perhaps having been influenced by the fact that the number of people required to solve the classical birthday problem, 23, is surprisingly low. I actually think the distribution of values around the true number – albeit on a logarithmic scale – shows a reasonably good collective attempt at guessing the true answer. One way of seeing that is to use standard statistical techniques to create a probability distribution based on your guesses. This is shown in the following figure (again on a logarithmic scale):

As you can see, the true value sits reasonably well in the heart of the estimated distribution, albeit towards the upper end. Again this confirms that the collective answers were pretty good, showing the value of teamwork over individuality, even when it comes to guesswork. Remember to collect your free Christmas drink from me as a reward. (APPLIES TO RESPONDENTS ONLY)

Enjoy the universe while you can

I’ve mentioned in the past that one of the great things about Statistics is the way it’s a very connected subject. A technique learnt for one type of application often turns out to be relevant for something completely different. But sometimes the connections are just for fun.

Here’s a case in point. A while back I wrote a post making fun of Professor Brian Cox, world renowned astrophysicist, who seemed to be struggling to get to grips with the intricacies of the Duckworth-Lewis method for adjusting runs targets in cricket matches that have been shortened due to poor weather conditions. You probably know, but I forgot to mention, that in his younger days Brian was the keyboard player for D:Ream. You’ll have heard their music even if you don’t recognise the name. Try this for example:

Anyway, shortly after preparing that earlier post, I received the following in my twitter feed:

I very much doubt it’s true, but I love the idea that the original version of

Things can only get better

was going to be

Things inexorably get worse, there’s a statistical certainty that the universe will fall to bits and die

Might not have had the same musical finesse, but is perhaps a better statement on the times we live in. Or as Professor Cox put it in his reply: