# CricViz

Many of you will know that my involvement with Smartodds stems from co-authorship of an academic paper with Mark Dixon. In this paper we developed a statistical model for calculating probabilities of football match results. Since then I’ve sometimes been asked – and indeed, was asked at a previous Smartodds offsite meeting – whether I regretted publishing that paper, rather than simply using its methodology to try to make money from bookmakers.

There are several answers to that question, including:

1. Mark was really the principal author for that work, and so it was mostly his choice what we did with it;
2. At the time I was genuinely more interested in the academic side of the work, rather than any potential it had for generating money;
3. The model alone was, at best, only marginally profitable. Without additional knowledge from football experts, it was unlikely to make money;
4. If we hadn’t published the paper, I’d probably have never ended up being connected to Smartodds.

Anyway, I recently thought about all this while following the Guardian coverage of England’s cricket World Cup semi-final, which mentioned that at that stage of the game – sometime in England’s innings after New Zealand had set a target of  – CricViz were giving England a 79% chance of winning. I’d never heard of CricViz, so I followed the links and discovered that it’s fundamentally an in-running cricket model that sits on your phone. You can get a complete description and links to download for Android or IOS here.

In terms of interface, CricViz is light years ahead of the work on football that I published with Mark Dixon. If you’d wanted to make predictions for football matches having read our original paper, you’d have had to collect data, program the model and run the predictions yourself. CricViz gives you live predictions for important matches both before the match starts, and over-by-over as the match progresses. It’s brilliant. And so, a similar question might be put to the authors of CricViz: why give this tool away for free, instead of using the methodology to fleece the bookmakers?

There are probably multiple answers to this question too, but one central issue is obviously the quality of the model on which CricViz is based. Though my paper with Mark Dixon didn’t make it easy for readers to calculate match numbers for themselves, it did provide both a complete mathematical recipe for what was needed as well as an analysis of historical results demonstrating its potential. CricViz does neither of those things. Its home website simply states…

WinViz  is built upon CricViz’s proprietary model of T20 cricket. This model takes the career records of the players involved, historical data from the venue and country where the match is played, and the current match situation. the model then computes the probability of each result.

So, although you can launch WinViz on your phone to generate numbers live as a match progresses, the details of how those numbers are calculated are sketchy. Let’s make some guesses though…

A complicating feature of cricket is that there are different factors that contribute to the strength of a team’s position during a game, including:

1. The number of runs the other team has already scored, where appropriate;
2. The number of runs the batting team has scored so far in the innings;
3. The number of wickets for the batting team that have already been lost;
4. The number of balls remaining in the innings.

And all of this is before taking account of the actual strength of the two teams.

But we’ve discussed this issue in an earlier post – here – and it also got a mention here.  In summary, a team’s remaining strength in an innings can be considered to be a function of the resources still available to them, as measured by balls and wickets. And in a landmark study, Duckworth and Lewis developed a formula which maps available resources into expected runs. Their objective was to develop a method that would provide a fair target for teams when matches were reduced by bad weather, leading to different numbers of balls received by each team. But, the Duckworth-Lewis formula works equally well as a baseline method for in-running match predictions in matches without weather restrictions. And it’s likely that when the authors of CricViz say their model takes into account ‘current match situation’, this is precisely what they mean and how they do it.

The rest is more vague though. The career history of the players involved is taken into account, as is the history of previous matches in the same stadium and country. This suggests some kind of regression modelling that takes account of these aspects, but it’s not clear whether this applies to the Duckworth-Lewis adjustment itself or to the baseline deadball numbers to which the Duckworth-Lewis adjustment is applied.

For example: the deadball estimate for the number of runs scored in a complete innings by a particular team might be 300. After they have scored 100 runs it might be that Duckworth-Lewis calculations lead to the assessment that they have used 25% of their resources for that innings. In which case, they would be predicted to score a further 75% of 300 on top of the 100 they have already scored, for a total of 325. And the WinViz model might imply adjustments to the 300 or to the 75% or both, depending on the team composition and the history of matches in that particular stadium and country.

But how well does WinViz perform? It’s actually very difficult to tell, since – perhaps to avoid scrutiny – the CricViz app includes a history section of recent matches only. For example, when writing this blog post soon after the World Cup, all World Cup matches were available, but they’ve now been deleted. So, it’s not possible to do any kind of serious diagnostic analysis of model performance, though a ‘sanity-check’ can be done on any of the games currently available in the history.

For example, here’s the story of England’s world cup final victory against New Zealand as told by WinViz at different stages go the match. Each of the figures is a screenshot of the CricViz iPhone app at the relevant stage in the match. The main features of each figure are predicted match outcome probabilities given current score and a graph showing the way the batting team’s score has increased throughout the innings so far, and how it’s predicted to increase over the rest of the innings.

1. New Zealand are to bat first and are predicted to score 305. With 50% probability their score is expected to fall in the interval (261, 349). England are expected to beat New Zealand’s score with probability 68%. The tie has just a 2% prediction probability.

2. After 15 overs, New Zealand have made a steady start in that the’ve only lost a single wicket. however, their scoring rate is quite low, so the England win probability has gone up slightly to 73%.

3. After 25 overs, New Zealand have kept the run rate ticking over, and have lost just one further wicket. England’s win probability remains pretty much unchanged.

4. After 30 overs New Zealand’s run rate has slowed a little and they’ve lost a further wicket. England’s win probability increases further to 81%.

5. On 35 overs New Zealand are still scoring at a slowish rate and have lost a fourth wicket. England now have an 86% win probability.

6. At the end of New Zealand’s innings, New Zealand amassed 241 runs. This is way short of England’s expected run total, which therefore leads them to maintaining an 86% win probability. (The following screenshot was taken during the final over when New Zealand had scored 240 runs).

7. England make a slow start in the first over, scoring just a single run. Their win probability drops just very slightly.

8. After 26 overs, England have had a mini-collapse, having scored just 94 runs – a lower figure than New Zealand made at the same point in their innings – while having lost 4 wickets. Their win probability drops dramatically to 48%.

9. A mini-recovery. On 41 overs England have increased their score to 168 – similar to New Zealand’s at the same point in their innings – without further loss of wickets. England’s win probability jumps back up to 66%.

10. After 49 overs, England are in trouble. With just one over left, England are 15 runs behind with 2 wickets remaining. The model gives England just an 18% probability of winning outright, though the tie also has a fairly high probability of 11%.

11. The rest is history.

It’s obviously impossible to validate the precision of WinViz from a single game, but notwithstanding at least one bug in the graphics – England’s ‘to win’ target is incorrect throughout – the basic sanity check seems to be satisfactory for this match at least.

CricViz was developed by Nathan Leamon, who acts as a data analyst for the England team. An interesting article on his background and perspective on the use of data for supporting team development is available here. David Hastie, who used to work for our quant team, also played some part in the rollout of CricViz, and kindly provided me with additional background information to help with the writing of this post.

# 66,666 Random Numbers, Volume 1

A while ago I posted about gambling at roulette, and explained that whatever strategy you adopt – excluding the possibility of using electronic equipment to monitor the wheel and ball speeds, and improve prediction of where the ball will land – no strategy can overcome the edge that casinos offer by giving unfavourable odds on winning outcomes. Now, believe it or not, I do a fair bit of research to keep this blog ticking over. And in the course of doing the research for a potential casino/roulette post, I came across this book:

That’s right: 66,6666 random numbers. But not just any numbers: the numbers on a roulette wheel, 0-36. The numbers are also colour coded as per a standard roulette wheel. Here’s a typical page:

But there’s more:

1. The book includes a bonus set of an extra 10,000 random numbers. (Question: why not just call the book 76,666 random numbers?)
2. There’s also an American Edition, which is almost identical, but accounts for the fact that in American casinos, the wheel also includes a 00.
3. This is just Volume 1. Further volumes don’t seem to have gone into production yet, but the title suggests it’s just a matter of time.

Now, tables of random numbers have their place in history. As explained in an earlier post, simulation is a widely-used technique in statistical analysis, when exact mathematical calculations for statistical problems are too complicated. And before computers were widely available, it was commonplace to use tables of random digits as the basic ingredient for simulation routines.

But, hello! This is 2019. Chances are there’s a reasonable random number generator in the calculator on your phone. Or you can go here and fiddle around with the settings in the dialog box. Or you can fire-off 66,666 random numbers with a one-line code in R or any other statistical language. You can even do it here:

   # simulate the results numbers <- sample(0:36, 66666, replace=T) # tabulate the results table(numbers) # show results as a barplot library(ggplot2) df<-data.frame(table(numbers)) colnames(df)<-c('number','frequency') ggplot(data=df, aes(x=number, y=frequency)) + geom_bar(stat="identity", width=0.5, fill='lightblue') +ggtitle('Frequencies of Results in 66,666 Roulette Spins')     

Just hit the ‘run’ button. This may not work with all browsers, but seems to work ok with Chrome.

The simulation is all done in the first non-comment line. The rest is just some baggage to tabulate the frequencies and show them graphically.

This approach has the advantages that:

1. You get different numbers every time you repeat the exercise, just like in real life;
2. The numbers are stored electronically, so you can analyse them easily using any statistical functions. If you ran the R session above, you’ll have seen the results in tabulated summary form, as well as in a barplot, for example. But since the data are stored in the object ‘numbers’, you can do anything you like with them. For example, typing ‘mean(numbers)’ give you the mean of the complete set of simulated spins.

So, given that there are many easy ways you can generate random numbers, why would anybody possibly want to buy a book with 66,666 random numbers? Well, here’s the author to explain:

After gaining a moderate amount of experience playing roulette, I discovered how easy it was to master the rules of the game – and still lose!

He goes on…

Having lost my bankroll and now distrusting my knowledge of statistics as they pertained to roulette, I scoured the Internet for more information on the game. My findings only confirmed what I already knew: that statistics can only define the shape and character of data and events that have already taken place and have no real bearing over the outcome of future spins.

And finally…

I chose to compile a book of 66,666 random numbers for two reasons: One, I’ve paid my dues – literally, I’ve lost thousands of dollars playing this game, and I don’t want you to suffer the same consequence; two, as roulette is a game played against the house and not against my fellow gamblers, I knew I wanted to provide you with the same opportunity to study these numbers and learn something that might just make a difference in the way you play the game.

In summary, despite having lost a fortune believing there is some system to win at roulette, and despite sincerely wishing that you avoid the same fate, having learned through experience that no roulette system can possibly work, the author has provided you with 66,666 spins (plus a bonus 10,000 extra spins) of a roulette wheel so that you can study the numbers and devise your own system.(Which is bound to fail and almost certainly cost you a fortune if you try to implement it).

Now, just to emphasise:

1. The random properties of a roulette wheel are very simply understood from basic probability;
2. A study of the outcome of randomly generated spins of a roulette wheel is a poor substitute for these mathematical properties;
3. Biases in the manufacture or installation of a roulette wheel, which could make some numbers, or sequences of numbers, more frequent than others, are likely to be vanishingly small. But if there were such biases, you’d need to study a very long series of the outcomes of that particular wheel to be able to exploit them;
4. You might choose to play roulette for fun. And you might even get lucky and win. But it is a game with negative expected winnings for the gambler, and if you play long enough you will lose with 100% certainty.

However, we’ve seen a similar mis-use of simulation before. In this post a newspaper did 100 random simulations of the NBA lottery draft in order to predict the lottery outcome. The only difference with the roulette simulation is that 66,666 is a rather bigger number – and therefore greater waste of time – than 100.

Moral: simulation can often be avoided through a proper understanding of the randomness in whatever process you are studying. But if you really have to simulate, learn the basics of a language like R; don’t waste time and money on books of random tables.

# Einstein versus the internet

=

Einstein is reported to have said:

No one can win at roulette unless he steals money from the table while the croupier isn’t looking

In contrast, the internet is full of playing systems that either guarantee you will win, or at least maximise your chances of doing so.

So who is right? Einstein or the internet?

Roulette – like most casino games – is a game with negative expected gain, whatever you choose to bet on. In European casinos a roulette wheel contains the numbers 1 to 36, as well as the number 0. That’s a total of 37 numbers. There are different type of bets available – betting on single numbers, betting on the colours of the numbers, of which – excluding zero – half are black and half are red, or betting on whether the result is odd or even etc. We’ll focus on bets on a single number, but the same argument applies to any type of bet.

The casino states odds for a single number bet of 35/1, which means that if you bet (say) $1 and your number comes up, you win$35 plus the return of your $1 stake. Otherwise you lose your stake. But since there are 37 numbers on the wheel, each of which is equally likely, your chance of winning is 1/37. So, the amount you expect to win with any such bet is: $(1/37 \times 35) + (36/37 x (-1)) = -0.027$ In other words, for every$100 you gamble at roulette in this way, you will lose an average of $2.70. Had the stated odds been 36/1 you can easily check that the expected winnings would be zero, in which case the game is said to be fair. But casinos aren’t there to be fair, they’re there to make money. And by reducing the payout odds from 36/1 to 35/1 they are guaranteed to do so. Incidentally, in the US and some other countries, the standard roulette wheel includes a ’00’ as well as a ‘0’, while the payout odds are kept the same. This means that your chances of winning are reduced to 1/38 and the expected loss per spin is almost doubled. Now, you might get lucky, and win anyway. But the fact that you will lose in the long run, at a rate of$2.70 per 100 bet on a European table, is a mathematical certainty. Except…. this assumes that successive spins of a roulette wheel are entirely unpredictable, even given the history of previous spins. And there are two reasons why, in theory at least, this assumption could be incorrect: 1. There is a bias in the wheel – perhaps due to misalignment or wheel manufacture – which means that some outcomes are more likely than others; 2. There is a serial dependence in the numbers, perhaps due to the pattern of ball spinning by the croupier, which means that given the sequence of previous spins, it’s possible to predict the outcome of the next spin with a better probability than that provided by completely random spins. In practice, the precision of equipment used in casinos eliminates the first of these possibilities, while the chaotic physics of ball and wheel spin as well as the complexity of the roulette design – which includes deflectors that interrupt the ball’s trajectory as it slows – eliminates the second. Nonetheless, there’s something of an industry on the internet – here and here, for example – of people trying to sell methods for predicting roulette outcomes based on the sequence of previous spins. Many of these even claim to be based on statistical analyses. In some cases the results they publish are impressive. But of course, it’s not clear how trustworthy these results are, nor indeed how they should be offset against the unknown results which are unpublished. Moreover, it seems reasonable to assume that if the developers of these methods had a foolproof system for winning at roulette, they might not need to be peddling the methods for a few dollars on the internet. One method which has been shown to work, if conducted carefully, involves the use of cameras to monitor the ball and wheel speed, and a computer to calculate the updated probabilities of the outcome based on the visual information provided by the cameras. I’ll discuss this approach in a future post. A different approach to serious gambling on roulette is the use of structured betting systems. The most famous one, the so-called martingale system, is to bet1 on either red or black numbers (or odd or even numbers), which have odds of 1/1. So, if you win, you win $1 plus return of stake. Again, this type of bet has negative expected gain because the probability of winning is just 18/37, since 0 is neither red nor black. In fact, you can easily check that the expected loss for a$1 bet is $0.027, just like for a single number bet. But the idea of the martingale system is this: if you win, you stop and keep your$1 profit. Otherwise, play again but raising your stake to $2. If you then win, you keep the$2 winnings, offset against the $1 lost in the previous round. So, you again win$1 overall. If you lose, you play again, raising your stake to $4. Again, if you win, you keep the$4, offset against $3 lost in the previous rounds, for an overall$1 win. And you keep playing this way, doubling your stake every time you lose, until eventually you win and stop, with guaranteed overall winnings of $1. Of course, you don’t have to start with$1. Start with $1000, doubling each time you lose, and your method would guarantee that you win$1000.

At least, that’s the theory. Maybe you can see the flaw in the approach. If not, drop me a line and I’ll explain.

But this raises a question: even though roulette has a negative expected gain on each spin, is there any betting strategy which could lead to an expected gain? For example, the gambler can choose when to play and when to stop, whereas the casino is obliged to accept all legally placed bets. So, one might ask whether, for example, a strategy where you decide to stop if you hit winnings above a certain level, and stop if you lose below a lower level, could possibly generate positive average wins.

Unfortunately, the answer to this also turns out to be no. There is a mathematical result called the Optional Stopping Theorem which basically says, if you have no method for predicting future results, there is no strategy which gives the gambler an expected gain. In other words, no matter how you play roulette, you will lose in the long run.

Damn Einstein!

# Love Island

A while back Harry.Hill@smartodds.co.uk gave a talk to the (then) quant team about trading strategies. The general issue is well-known: traders have to decide when to place a bet. Generally speaking they can place a bet early, when the price – the amount you get if you win the bet – is likely to be reasonably attractive. But in that case the liquidity of the market – the amount of money you can bet against – is likely to be low. Or they can wait until there is greater liquidity, but then the price is likely to be less attractive. So, given the option of a certain bet size at a stated price, should they bet now or wait in the hope of being able to make a bigger bet, albeit at a probably poorer price?

In general this is a difficult problem to tackle, and to make any sort of progress some assumptions have to be made about the way both prices and liquidity are likely to change as kick-off approaches. And Harry was presenting some tentative ideas, and pointing out some relevant research, that might enable us to get a handle on some of these issues.

Anyway, one of the pieces of work Harry referred to is a paper by F. Thomas Bruss, which includes the following type of example. You play a game where you can throw a dice (say) 10 times. Your objective is to throw a 6, at which point you can nominate that as your score, or continue.  But, here’s the catch: you only win if you throw a 6 and it’s the  final 6 in the sequence of 10 throws.

So, suppose you throw a 6 on the 3rd roll; should you stop? How about the 7th roll? Or the 9th? You can maybe see the connection with the trading issue: both problems require us to choose whether to stop or continue, based on an evaluation of the risk of what will subsequently occur.

Fast-forward a few days after Harry’s talk and I was reading Alex Bellos’s column in the Guardian. Alex is a journalist who writes about both football and mathematics (and sometimes both at the same time). His bi-weekly contributions to the Guardian take the form of mathematically-based puzzles. These puzzles are quite varied, covering everything from logic to geometry to arithmetic and so on. And sometimes even Statistics. Anyway, the puzzle I was reading after Harry’s talk is here. If you have time, take a read. Otherwise, here’s a brief summary.

It’s a basic version of Love Island. You have to choose from 3 potential love partners, but you only see them individually and sequentially. You are shown the first potential partner, and can decide to keep them or not. If you keep them, everything stops there. Otherwise you are shown the second potential partner. Again, you have to stick or twist: you can keep them, or you reject and are shown the third possibility. And in that case you are obliged to stick with that option.

In summary: once you stick with someone, that’s the end of the game. But if you reject someone, you can’t go back to them later. The question is: what strategy should you adopt in order to maximise the chances of choosing the person that you would have picked if you had seen all 3 at the same time?

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

As well as giving a clearer description of the problem, Alex’s article also contains a link to his discussion of the solution. But what’s interesting is that it’s another example of an optimal stopping problem: once we’ve seen a new potential partner, and also previous potential partners, we have to make a decision on whether to stop with what we currently have, or risk trying to get an improvement in the future, knowing that we could also end up with something/someone worse. And if we can solve the problem for love partners, we are one step towards solving the problem for traders as well.

The Love Island problem discussed by Alex is actually a special case of The Secretary Problem.  A company needs to hire a secretary and does so by individual interviews. Once they’ve conducted an interview they have to hire or reject that candidate, without the possibility of returning to him/her once rejected. What strategy should they adopt in order to try to get the best candidate? In the Love Island version, there are just 3 candidates; in the more general problem, there can be any number. With 3 choices, and a little bit of patience, you can probably find the solution yourself (or follow the links towards Alex’s discussion of the solution). But how about if you had 1000 possible love partners? (Disclaimer: you don’t).

Actually, there is a remarkably simple solution to this problem whatever the number of options to choose from: whether it’s 3, 1000, 10,000,000 or whatever. Let this number of candidates be N. Then reject all candidates up to the M’th for some value of M, but keep note of the best candidate, C say, from those M options. Then accept the first subsequent candidate who is better than C in subsequent interviews (or the last candidate if none happens to be better).

But how to choose M? Well, even more remarkably, it turns out that if N is reasonably large, the best choice for M is around N/e, where $e \approx 2.718$ is a number that crops up a lot in mathematics. For N=1000 candidates, this means rejecting the first 368 and then choosing the first that is better than the best of those. And one more remarkable thing about this result: the probability that the candidate selected this way is actually the best out of all the available candidates is 1/e, or approximately 37%, regardless of the value of N.

With N=3, the value of N is too small for this approximate calculation of M to be accurate, but if you calculated the solution to the problem – or looked at Alex’s – you’ll see that the solution is precisely of this form, with M=2 and a probability of 50% of picking the best candidate overall.

Anyway, what I really like about all this is the way things that are apparently unconnected – Love Island, choosing secretaries, trading strategies – are fundamentally linked once you formulate things in statistical terms. And even if the solution in one of the areas is too simple to be immediately transferable to another, it might at least provide useful direction.

# Ernie is dead, long live Ernie

Oh no, this weekend they killed Ernie

Well, actually, not that one. This one…

No, no, no. That one died some time ago. This one…

But don’t worry, here’s Ernie (mark 5)…

Let me explain…

Ernie (Electronic Random Number Indicator Equipment) is the acronym of the random number generator that is used by the government’s National Savings and Investments (NSI) department for selecting Premium Bond winners each month.

Premium bonds are a form of savings certificates. But instead of receiving a fixed or variable interest rate paid at regular intervals, like most savings accounts, premium bonds are a gamble. Each month a number of bonds from all those in circulation are selected at random and awarded prizes, with values ranging from ￡25 to ￡1,000,000. Overall, the annual interest rate is currently around 1.4%, but with this method most bond holders will receive 0%, while a few will win many times more than the actual bond value of ￡1, up to one million pounds.

So, your initial outlay is safe when you buy a premium bond – you can always cash them in at the price you paid for them – but you are gambling with the interest.

Now, the interesting thing from a statistical point of view is the monthly selection of the winning bonds. Each month there are nearly 3 million winning bonds, most of which win the minimum prize of ￡25, but 2 of which win the maximum of a million pounds. All these winning bonds have to be selected at random. But how?

As you probably know, the National Lottery is based on a single set of numbers that are randomly generated through the physical mechanism of the mixing and selection of numbered balls. But this method of random number generation is completely impractical for the random selection of several million winning bonds each month. So, a method of statistical simulation is required.

In a previous post we already discussed the idea of simulation in a statistical context. In fact, it turns out to be fairly straightforward to generate mathematically a series of numbers that, to all intents and purposes, look random. I’ll discuss this technique in a future post, but the basic idea is that there are certain formulae which, when used recursively, generate a sequence of numbers that are essentially indistinguishable from a series of random numbers.

But here’s the thing: the numbers are not really random at all. If you know the formula and the current value in the sequence, you can calculate exactly the next value in the sequence. And the next one. And so on.

Strictly, a sequence of numbers generated this way is called ‘pseudo-random’, which is a fancy way of saying ‘pretend-random’. They look random, but they’re not. For most statistical purposes, the difference between a sequence that looks random and is genuinely random is unimportant, so this method is used as the basis for simulation procedures. But for the random selection of Premium Bond winners, there are obvious logistic and moral problems in using a sequence of numbers that is actually predictable, even if it looks entirely random.

For this reason, Ernie was invented. Ernie is a random number generator. But to ensure the numbers are genuinely random, it incorporates a genuine physical process whose behaviour is entirely random. A mathematical representation of the state of this physical process then leads to the random numbers.

The very first Ernie is shown in the second picture above. It was first used in 1957, was the size of a van and used a gas neon diode to induce the randomness. Though effective, this version of Ernie was fairly slow, generating just 2000 numbers per hour. It was subsequently killed-off and replaced with ever-more efficient designs over the years.

The third picture above shows Ernie (mark 4), which has been in operation from 2004 up until this weekend. In place of gas diodes, it used thermal noise in transistors to generate the required randomness, which in turn generated the numbers. Clearly, in terms of size, this version was a big improvement on Ernie (mark 1), being about the size of a normal PC. It was also much more efficient, being able to generate one million numbers in an hour.

But Ernie (mark 4) is no more. The final picture above shows Ernie (mark 5), which came into operation this weekend, shown against the tip of a pencil. It’s essentially a microchip. And of course, the evolution of computing equipment the size of a van to the size of a pencil head over the last 60 years or so is a familiar story. Indeed Ernie (mark 5) is considerably faster – by a factor of 42.5 or so – even compared to Ernie (mark 4), despite the size reduction. But what really makes the new version of Ernie stand out is that the physical process that induces the randomness has fundamentally changed. One way or another, all the previous versions used thermal noise to generate the randomness; Ernie (mark 5) uses quantum random variation in light signals.

More information on the evolution of Ernie can be found here. A slightly more technical account of the way thermal noise was used to generate randomness in each of the Ernie’s up to mark 4 is given here. The basis of the quantum technology for Ernie mark 5 is that when a photon is emitted towards a semi-transparent surface, is either reflected or transmitted at random. Converting these outcomes into 0/1 bit values, forms the building block of random number generation.

Incidentally, although the randomness in the physical processes built into Ernie should guarantee that the numbers generated are random, checks on the output are carried out by the Government Actuary’s Department to ensure that the output can genuinely be regarded as random. In fact they apply four tests to the sequence:

1. Frequency: do all digits occur (approximately) equally often?
2. Serial: do all consecutive number pairs occur (approximately) equally often?
3. Poker: do poker combinations (4 identical digits; 3 identical digits; two pairs; one pair; all different) occur as often as they should in consecutive numbers?
4. Correlation: do pairs of digits at different spacings in bond numbers have approximately the correct correlation that would be expected under randomness?

In the 60 or so years that premium Bonds have been in circulation, the monthly numbers generated by each of the successive Ernie’s have never failed to pass these tests.

However:

Finally, in case you’re disappointed that I started this post with a gratuitous reference to Sesame Street which I didn’t follow-up on, here’s a link to 10 facts and statistics about Sesame Street.

It’s sometimes said that a little knowledge is a dangerous thing. Arguably, too much knowledge is equally bad. Indeed, Einstein is quoted as saying:

A little knowledge is a dangerous thing. So is a lot.

I don’t suppose Einstein had gambling in mind, but still…

March Madness pools are a popular form of betting in the United States. They are based on the playoff tournament for NCAA college basketball, held annually every March, and comprise a so-called bracket bet. Prior to the tournament start, a player predicts the winners of each game from the round-of-sixteen right through to the final. This is possible since teams are seeded, as in tennis, so match pairings for future rounds are determined automatically once the winners from previous rounds are known. In practice, it’s equivalent to picking winners from the round-of-sixteen onwards in the World Cup.

There are different scoring systems for judging success in bracket picks, often with more weight given to correct outcomes in the later rounds, but in essence the more correct outcomes a gambler predicts, the better their score. And the player with the best score within a pool of players wins the prize.

Naturally, you’d expect players with some knowledge of the differing strength of the teams involved in the March Madness playoffs to do better than those with no knowledge at all. But is it the case that the more knowledge a player has, the more successful they’re likely to be? In other words:

To what extent is success in the March Madness pools determined by a player’s basketball knowledge?

This question was explored in a recent academic study discussed here. In summary, participants were given a 25-question basketball quiz, the results of which were used to determine their level of basketball knowledge. Next, they were asked to make their bracket picks for the March Madness. A comparison was then made between accuracy of bracket picks and level of basketball knowledge.

The results are summarised in the following graph, which shows the average relationship between pick accuracy and basketball knowledge:

As you’d expect, the players with low knowledge do relatively badly.  Then, as a player’s basketball knowledge increases, so does their pick accuracy. But only up to a point. After a certain point, as a player’s knowledge increases, their pick accuracy was found to decrease. Indeed, the players with the most basketball knowledge were found to perform slightly worse than those with the least knowledge!

Why should this be?

The most likely explanation is as follows…

Consider an average team, who have recently had a few great results. It’s possible that these great results are due to skill, but it’s also plausible that the team has just been a bit lucky. The player with expert knowledge is likely to know about these recent results, and make their picks accordingly. The player with medium knowledge  will simply know that this is an average team, and also bet accordingly. While the player with very little knowledge is likely to treat the team randomly.

Random betting due to lack of knowledge is obviously not a great strategy. However, making picks that are driven primarily by recent results can be even worse, and the evidence suggests that’s exactly what most highly  knowledgable players do. And it turns out to be better to have just a medium knowledge of the game, so that you’d have a rough idea of the relative rankings of the different teams, without being overly influenced by recent results.

Now, obviously, someone with expert knowledge of the game, but who also knows how to exploit that knowledge for making predictions, is likely to do best of all. And that, of course, is the way sports betting companies operate, combining expert sports knowledge with statistical support to exploit and implement that knowledge. But the study here shows that, in the absence of that explicit statistical support, the player with a medium level of knowledge is likely to do better than players with too little or too much knowledge.

In some ways this post complements the earlier post ‘The benefit of foresight’. The theme of that post was that successful gambling cannot rely solely on Statistics, but also needs the input of expert sports knowledge. This one says that expert knowledge, in isolation, is also insufficient, and needs to be used in tandem with statistical expertise for a successful trading strategy.

In the specific context of betting on the NCAA March Madness bracket, the argument is developed fully in this book. The argument, though, is valid much more widely across all sports and betting regimes, and emphasises the importance to a sports betting company of both statistical and sport expertise.

Update (21/3): The NCAA tournament actually starts today. In case you’re interested, here’s Barack Obama’s bracket pick. Maybe see if you can do better than the ex-President of the United States…

# Who wants to win £194,375?

In an earlier post I included a link to Oscar predictions by film critic Mark Kermode over the years, which included 100% success rate across all of the main categories in a couple of years. I also recounted his story of how he failed to make a fortune in 1992 by not knowing about accumulator bets.

Well, it’s almost Oscar season, and fabien.mauroy@smartodds.co.uk pointed me to this article, which includes Mark’s personal shortlist for the coming awards. Now, these aren’t the same as predictions: in some year’s, Mark has listed his own personal favourites as well as what he believes to be the likely winners, and there’s often very little in common. On the other hand, these lists have been produced prior to the nominations, so you’re likely to get better prices on bets now, rather than later. You’ll have to be quick though, as the nominations are announced in a couple of hours.

Anyway, maybe you’d like to sift through Mark’s recommendations, look for hints as to who he thinks the winner is likely to be, and make a bet accordingly. But if you do make a bet based on these lists, here are a few things to take into account:

1. Please remember the difference between an accumulator bet and single bets;
3. Please don’t blame me if you lose.

If Mark subsequently publishes actual predictions for the Oscars, I’ll include a link to those as well.

Update: the nominations have now been announced and are listed here. Comparing the nominations with Mark Kermode’s own list, the number of nominations which appear in Mark’s personal list for each category are as follows:

Best Picture: 1

Best Director: 2

Best Actor: 1

Best Actress: 2

Best supporting Actor: 3

Best supporting Actress: 1

Best Score: 2

In each case except Best Picture, there are 5 nominations and Mark’s list also comprised 5 contenders. For Best Picture, there are 8 nominations, though Mark only provided 5 suggestions.

So, not much overlap. But again, these weren’t intended to be Mark’s predictions. They were his own choices. I’ll aim to update with Mark’s actual predictions if he publishes them.

# Pulp Fiction (Our Esteemed Leader’s cut)

The previous post had a cinematic theme. That got me remembering an offsite a while back where Matthew.Benham@smartbapps.co.uk gave a talk that I think he called ‘Do the Right Thing’, which is the title of a 1989 Spike Lee film. Midway through his talk Matthew gave a premiere screening of his own version of a scene from Pulp Fiction. Unfortunately, I’ve been unable to get hold of a copy of Matthew’s cut, so we’ll just have to make do with the inferior original….

The theme of Matthew’s talk was the importance of always acting in relation to best knowledge, even if it contradicts previous actions taken when different information was available. So, given the knowledge and information you had at the start of a game, you might have bet on team A. But if the game evolves in such a way that a bet on team B becomes positive value, you should do that. Always do the right thing. And the point of the scene from Pulp Fiction? Don’t let pride get in the way of that principle.

These issues will make a great topic for this blog sometime. But this post is about something else…

Dependence is a big issue in Statistics, and we’re likely to return to it in different ways in future posts. Loosely speaking, two events are said to be independent if knowing the outcome of one, doesn’t affect the probabilities of the outcomes of the other. For example, it’s usually reasonable to treat the outcomes of two different football matches taking place on the same day as independent. If we know one match finished 3-0, that information is unlikely to affect any judgements we might have about the possible outcomes of a later match. Events that are not independent are said to be dependent: in this case, knowing the outcome of one will affect the outcome of the other.  In tennis matches, for example, the outcome of one set tends to affect the chances of who will win a subsequent set, so set winners are dependent events.

With this in mind, let’s follow-up the discussion in the previous 2 posts (here and here) about accumulator bets. By multiplying prices from separate bets together, bookmakers are assuming that the events are independent. But if there were dependence between the events, it’s possible that an accumulator offers a value bet, even if the individual bets are of negative value. This might be part of the reason why Mark Kermode has been successful in several accumulator bets over the years (or would have been if he’d taken his predictions to the bookmaker and actually placed an accumulator bet).

Let me illustrate this with some entirely made-up numbers. Let’s suppose ‘Pulp Fiction (Our Esteemed Leader’s cut)’, is up for a best movie award, and its upstart director, Matthew Benham, has also been nominated for best director. The numbers for single bets on PF and MB are given in the following table. We’ll suppose the bookmakers are accurate in their evaluation of the probabilities, and that they guarantee themselves an expected profit by offering prices that are below the fair prices (see the earlier post).

True Probability Fair Price Bookmaker Price
Best Movie: PF 0.4 2.5 2
Best Director: MB 0.25 4 3.5

Because the available prices are lower than the fair prices and the probabilities are correct, both individual bets have negative value (-0.2 and -0.125 respectively for a unit stake). The overall price for a PF/MB accumulator bet is 7, which assuming independence is an even poorer value bet, since the expected winnings from a unit stake are

$0.4 \times 0.25 \times 7 -1 = -0.3$

However, suppose voters for the awards tend to have similar preferences across categories, so that if they like a particular movie, there’s an increased chance they’ll also like the director of that movie. In that case, although the table above might be correct, the probability of MB winning the director award if PF (MB cut) is the movie winner is likely to be greater than 0.25. For argument’s sake, let’s suppose it’s 0.5. Then, the expected winnings from a unit stake accumulator bet become

$0.4 \times 0.5 \times 7 -1 = 0.4$

That’s to say, although the individual bets are still both negative value, the accumulator bet is extremely good value. This situation arises because of the implicit assumption of independence in the calculation of accumulator prices. When the assumption is wrong, the true expected winnings will be different from those implied by the bookmaker prices, potentially generating a positive value bet.

Obviously with most accumulator bets – like multiple football results – independence is more realistic, and this discussion is unhelpful. But for speciality bets like the Oscars, or perhaps some political bets where late swings in votes are likely to affect more that one region, there may be considerable value in accumulator bets if available.

If anyone has a copy of Our Esteemed Leader’s cut of the Pulp Fiction scene on a pen-drive somewhere, and would kindly pass it to me, I will happily update this post to include it.

# How to not win ￡194,375

In the previous post we looked at why bookmakers like punters to make accumulator bets: so long as a gambler is not smart enough to be able to make positive value bets, the bookmaker will make bigger expected profits from accumulator bets than from single bets. Moreover, even for smart bettors, if any of their individual bets are not smart, accumulator bets may also favour the bookmaker.

With all this in mind, here’s a true story…

Mark Kermode is a well-known film critic, who often appears on BBC TV and radio. In the early 90’s he had a regular slot on Danny Baker’s Radio 5 show, discussing recent movie releases etc. On one particular show early in 1992, chatting to Danny, he said he had a pretty good idea of how most of the important Oscars would be awarded that year. This was actually before the nominations had been made, so bookmaker prices on award winners would have been pretty good and since Radio 5 was a predominantly sports radio station, Danny suggested Mark make a bet on the basis of his predictions.

Fast-forward a few months to the day after the Oscar awards and Danny asked Mark how his predictions had worked out. Mark explained that he’d bet on five of the major Oscar awards and they’d all won. Danny asked Mark how much he’d won and he replied that he’d won around ￡120 for a ￡25 stake.  Considering the difficulty in predicting five correct winners, especially before nominations had been made, this didn’t seem like much of a return, and Danny Baker was incredulous. He’d naturally assumed that Mark would have placed an accumulator bet with the total stake of ￡25, whereas what Mark had actually done was place individual bets of ￡5 on each of the awards.

Now, I’ve no idea what the actual prices were, but since the bets were placed before the nominations were announced, it’s reasonable to assume that the prices were quite generous. For argument’s sake, let’s suppose the bets on each of the individual awards  had a price of 6. Mark then placed a ￡5 bet on each, so he’d have made a profit of ￡25 per bet, for an overall profit of ￡125. Now suppose, instead, he’d made a single accumulator bet on all 5 awards. In this case he’d have made a profit of

$\pounds 25 \times 6 \times 6 \times 6 \times 6 \times 6 -\pounds 25 = \pounds 194,375$

Again, I’ve no idea if these numbers are accurate or not, but you get the picture. Had Mark made the accumulator bet that Danny intended, he’d have made a pretty big profit. As it was, he won enough for a night out with a couple of mates at the cinema, albeit with popcorn included.

Of course, the risk you take with an accumulator is that it just takes one bet to fail and you lose everything. By placing 5 single bets Mark would still have won ￡95 if one of his predictions had been wrong, and would even make a fiver if he got just one prediction correct. But by not accumulating his bets, he also avoided the possibility of winning ￡194,375 if all 5 bets came in. Which they did!

So, what’s the story here? Though an accumulator is a poor value bet for mug gamblers, it may be an extremely valuable bet for sharp gamblers, and the evidence suggests (see below) that Mark Kermode is sharper than the bookmakers for Oscar predictions.

Is Mark Kermode really sharper than the bookmakers for Oscar predictions? Well, here’s a list  of his predictions for the main 6 (not 5) categories for the years 2006-2017. Mark predicted all 6 categories with 100% accuracy twice in twelve years. I guess that these predictions weren’t always made before the nominations, so the prices are unlikely to be as good as in the example described above. But still, the price on a 6-fold accumulator will have been pretty good regardless. And he’d have won twice, in addition to the 1992 episode (and possibly more often in the intervening years for which I don’t have data). Remarkably, he would have won again in 2017 if the award for best movie had gone to La La Land, as was originally declared winner, rather than Moonlight, which was the eventual winner.

Moral: try to find out Mark’s predictions for the 2019 Oscars and don’t make the mistake of betting singles!

And finally, here’s Mark telling the story of not winning something like￡194,375 in his own words:

# Bookmakers love accumulators

You probably know about accumulator, or so-called ‘acca’, bets. Rather than betting individually on several different matches, in an accumulator any winnings from a first bet are used as the stake in a second bet.  If either bet loses, you lose, but if both bets win, there’s the potential to make more money than is available from single bets due to the accumulation of the prices. This process can be applied multiple times, with the winnings from several bets carried over as the stake to a subsequent bet, and the total winnings if all bets come in can be substantial. On the downside, it just takes one bet to lose and you win nothing.

Bookmakers love accumulators, and often apply special offers – as you can see in the profile picture above – to encourage gamblers to make such bets. Let’s see why that’s the case.

Consider a tennis match between two equally-matched players. Since the players are equally-matched, it’s reasonable to assume that each has a probability 0.5 of winning. So if a bookmaker was offering fair odds on the winner of this match, he should offer a price of 2 on either player, meaning that if I place a bet of 1 unit I will receive 2 units (including the return of my stake) if I win. This makes the bet fair, in the sense that my expected winnings – the amount I would win on average if the game were repeated  many times – is zero. This is because

$(1/2 \times 2) + (1/2 \times 0) -1 = 0$

That’s the sum of the probabilities multiplied by the prices, take away the stake.

The bet is fair in the sense that, if the match were repeated many times, both the gambler and the bookmaker would expect neither to win nor lose. But bookmakers aren’t in the business of being fair; they’re out to make money and will set lower prices to ensure that they have long-run winnings. So instead of offering a price of 2 on either player, they might offer a price of 1.9. In this case, assuming gamblers split their stakes evenly across two players, bookmakers will expect to win the following proportion of the total stake

$1-1/2\times(1/2 \times 1.9) - 1/2\times (1/2 \times 1.9)=0.05$

In other words, bookmakers have a locked-in 5% expected profit. Of course, they might not get 5%. Suppose most of the money is placed on player A, who happens to win. Then, the bookmaker is likely to lose money. But this is unlikely: if the players are evenly matched, the money placed by different gamblers will probably be evenly spread between the two players. And if it’s not, then the bookmakers can adjust their prices to try to encourage more bets on the less-favoured side.

Now, in an accumulator bet, the prices are multiplied. It’s equivalent to taking all of your winnings from a first bet and placing them on a second bet. Then those winnings are placed on the outcome of a third bet, and so on. So if there are two tennis matches, A versus B and C versus D, each of which is evenly-matched, the fair and actual prices on the accumulator outcomes are as follows:

Accumulator Bet A-C A-D B-C B-D
Fair Price 4 4 4 4
Actual Price 3.61 3.61  3.61 3.61

The value 3.61 comes from taking the prices of the individual bets, 1.9 in each case, and multiplying them together. It follows that the expected profit for the bookmaker is

$1-4\times 1/4\times(1/4 \times 3.61) = 0.0975$.

So, the bookmaker profit is now expected to be almost 10%. In other words, with a single accumulator, bookmakers almost double their expected profits. With further accumulators, the profits increase further and further. With 3 bets it’s over 14%; with 4 bets it’s around 18.5%. Because of this considerable increase in expected profits with accumulator bets, bookmakers can be ‘generous’ in their offers, as the headline graphic to this post suggests. In actual fact, the offers they are making are peanuts compared to the additional profits they make through gamblers making accumulator bets.

However… all of this assumes that the bookmaker sets prices accurately. What happens if the gambler is more accurate in identifying the fair price for a bet than the bookmaker? Suppose, for example, a gambler reckons correctly that the probabilities for players A and C to win are 0.55 rather than 0.5. A single stake bet spread across the 2 matches would then generate an expected profit of

$0.55\times(1/2 \times 1.9) + 0.55\times (1/2 \times 1.9) -1 = 0.045$

On the other hand, the expected profit from an accumulator bet on A-C is

$(0.55\times1.9) \times (0.55\times1.9) -1 = 0.092$

In other words, just as the bookmaker increases his expected profit through accumulator bets when he has an advantage per single bet, so does the gambler. So, bookmakers do indeed love accumulators, but not against smart gamblers.

In the next post we’ll find out how not knowing the difference between accumulator and standard bets cost one famous gambler a small fortune.

Actually, the situation is not quite as favourable for smart gamblers as the above calculation suggests. Suppose that the true probabilities for a win for A and C are 0.7 and 0.4, which still averages at 0.55. This situation would arise, for example, if the gambler was using a model which performed better than he realised for some matches, but worse than he realised for others.

The expected winnings from single bets remain at 0.045. But now, the expected winnings from an accumulator bet are just:

$(0.7\times1.9) \times (0.4\times1.9) -1 = 0.011,$

which is considerably lower. Moreover, with different numbers, the expected winnings from the accumulator bet could be negative, even though the expected winnings from separate bets is positive. (This would happen, for example, if the win probabilities for A and C were 0.8 and 0.3 respectively.)

So unless the smart gambler is genuinely smart on every bet, an accumulator bet may no longer be in his favour.