# Can’t buy me love

Ok, money can’t buy you love, but can it buy you the Premier League title? We’ll look at that below, but first this recent Guardian article notes the following Premier League statistics:

Between 2003 and 2006 there were just 3 instances of a team having more than 70% of possession in a game. Two seasons ago there were 37, last season 63 and this season 67.

In other words, by even the simplest of statistical measures, Premier League games are becoming increasingly one-sided, at least in terms of possession. And the implication in the Guardian article is that money is the driving factor behind this imbalance. But is that really the case?

This graph shows final league position of the 20 Premier League teams plotted against their wealth in terms of start-of-season squad market value (taken from here).

To make things slightly clearer, the following diagram shows the same thing, but with a smooth curve (in blue) added on top, estimated using standard statistical techniques, which shows the overall trend in the data.

Roughly speaking, teams above the blue line have performed better than their financial resources would have suggested; those below have performed worse.

Bear in mind this is just one season’s data. Also, success breeds success, and money breeds money, so the differential between teams in terms of wealth as a season progresses is likely to increase further. For these reasons and others, not too much should be read into the slight wobbles in the blue curve. Nonetheless, a number of general features emerge:

1. It’s a very noisy picture for teams with less than ￡250 m. Arguably, at that level, there’s no very obvious pattern between wealth and final position: there’s a bunch of teams with between ￡100 m and ￡250 m, and their league position within this group of teams isn’t obviously dependent on their wealth. As such, teams in this category are unlikely to get out of the bottom half of the table, and their success within the bottom half is more likely to depend on how well they’ve spent their money than on how much they actually have. And on luck.
2. Teams with between ￡250 m and ￡500 m are likely to force their way out of the ‘relegation-battle pack’, but not into the top 6 elite.
3. The cost of success at the top end is high: the blue curve at the top end is quite flat, so you have to spend a lot to improve your position. But money, as long as there’s enough of it, counts a lot for elite clubs, and the evidence is that the teams who are prepared to spend the most are likely to improve their league position.
4. A couple of clubs stand out as having performed very differently to what might be expected: Manchester United have considerably under-performed, while Wolves have substantially over-performed.

The trials and tribulations of Manchester United are well documented. Chances are they just need a change of manager. <Joke>. But Wolves is a much more interesting case, which takes us back to the Guardian article I referred to. As discussed above, this article is more about the way money is shaping the way games are played rather than about the success it brings, with matches between the rich and poor teams increasingly becoming challenges of the attack of one side against the defence of the other. But Wolves have adapted to such imbalances, playing long periods without possession, and attacking with speed and precision when they do have the ball. The template for this type of play was Leicester City in their title-winning season, but even though it was just a few seasons ago, the financial imbalances were far smaller than now.

It seems then, that to a very large extent, a team’s performance in the Premier League is likely to be determined by its wealth. Good management can mitigate for this, just as bad management can lead to relatively poor performance. But even where teams are punching above their weight, they are having to do so by adapting their gameplay, so that matches are still dominated in terms of possession by the wealthier sides. As the Guardian article concludes:

Money guides everything. There have always been rich clubs, of course, but they have never been this rich, and the financial imbalances have never had such an impact on how the game is played.

# Freddy’s story: part 1

This is a great story with a puzzle and an apparent contradiction at the heart of it, that you might like to think about yourself.

A couple of weeks ago Freddy.Teuma@smartodds.co.uk wrote to me to say that he’d been looking at the recent post which discussed a probability puzzle based on coin tossing, and had come across something similar that he thought might be useful for the blog. Actually, the problem Freddy described was based on an algorithm for optimisation using genetic mutation techniques, that a friend had contacted him about.

To solve the problem, Freddy did four smart things:

1. He first simplified the problem to make it easier to tackle, while still maintaining its core elements;
2. He used intuition to predict what the solution would be;
3. He supported his intuition with mathematical formalism;
4. He did some simulations to verify that his intuition and mathematical reasoning were correct.

This is exactly how a statistician would approach both this problem and problems of greater complexity.

However… the pattern of results Freddy observed in the simulations contradicted what his intuition and mathematics had suggested would happen, and so he adjusted his beliefs accordingly. And then he wrote to me.

This is the version of the problem that Freddy had simplified from the original…

Suppose you start with a certain amount of money. For argument’s sake, let’s say it’s ￡100. You then play several rounds of a game. At each round the rules are as follows:

1. You toss a fair coin (Heads and Tails each have probability 1/2).
2. If the coin shows Heads, you lose a quarter of your current amount of money and end up with 3/4 of what you had at the start of the round.
3. If the coin shows Tails, you win a quarter of your current amount of money and end up with 5/4 of what you had at the start of the round.

For example, suppose your first 3 tosses of the coin are Heads, Tails, Heads. The money you hold goes from ￡100, to ￡75 to ￡93.75 to ￡70.3125.

Now, suppose you play this game for a large number of rounds. Again, for argument’s sake, let’s say it’s 1000 rounds. How much money do you expect to have, on average, at the end of these 1000 rounds?

Have a think about this game yourself, and see what your own intuition suggests before scrolling down.

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

Freddy’s reasoning was as follows. In each round of the game I will lose or gain 25% of my current amount of money with equal probability. So, if I currently have ￡100, then at the end of the next round I will have either ￡75 or ￡125 with equal probability. And the average is still ￡100. This reasoning is true at each round of the game. And so, after any number of rounds, including 1000, I’d expect to have exactly the same amount of money as when I started: ￡100.

But when Freddy simulated the process, he found a different sort of behaviour. In each of his simulations, the money held after 1000 rounds was very close to zero, suggesting that the average is much smaller than ￡100.

I’ve taken the liberty of doing some simulations myself: the pattern of results in 16 repeats of the game, each time up to 1000 rounds,  is shown in the following figure.

Each panel of the figure corresponds to a repeat of the game, and in each repeat I’ve plotted a red trace showing how much money I hold after each round of the game.  In each case you can see that I start with ￡100, there’s then a bit of oscillation – more in some of the realisations than in others, due to random variation – but in all cases the amount of money I have hits something very close to zero somewhere before 250 rounds and then stays there right up to 1000 rounds.

So, there is indeed a conflict between Freddy’s intuition and the picture that these simulations provide.

What’s going on?

I’ll leave you to think about it for a while, and write with my own explanation and discussion of the problem in a future post. If you’d like to write to me to explain what you think is happening, I’d be very happy to hear from you.

Obviously, I’m especially grateful to Freddy for having sent me the problem in the first place, and for agreeing to let me write a post about it.

Update: if you’d like to run the simulation exercise yourself, just click the ‘run’ button in the following window. This will simulate the game for 1000 rounds, starting with ￡100. The graph will show you how much money you hold after each round of the game, while if you toggle to the console window it will tell you how much money you have after the 1000th round (to the nearest ￡0.01). This may not work in all browsers, but seems to work ok in Chrome. You can repeat the experiment simply by clicking ‘Run’ again. You’re likely to get a different graph each time because of the randomness in the simulations. But what about the final amount? Does that also change? And what does it suggest about Freddy’s reasoning that the average amount should stay equal to ￡100?

 game_sim<-function(n_rounds=1000, money_start=100){ require(ggplot2) money<-c() money[1]<-money_start for(i in 2:(n_rounds)){ money[i]<-money[i-1]*sample(c(.75,1.25),1) } m<-data.frame(round=1:n_rounds,money=money) cat('Money in pounds after ',n_rounds, ' rounds is ',round(money[n_rounds],2)) ggplot(aes(x=round,y=money),data=m)+geom_line(color='red')+ ggtitle('Money') }   game_sim()     

# Nul Points

No doubt you’re already well-aware of, and eagerly anticipating, this year’s Eurovision song contest final to be held in Tel Aviv between the 14th and 18th May. But just in case you don’t know, the Eurovision song contest is an annual competition to choose the ‘best’ song entered between the various participating European countries. And Australia!

Quite possibly the world would never have heard of Abba if they hadn’t won Eurovision. Nor Conchita Wurst.

The voting rules have changed over the years, but the structure has remained pretty much the same. Judges from each participating country rank their favourite 10  songs – excluding that of their own country, which they cannot vote for – and points are awarded on the basis of preference. In the current scheme, the first choice gets 12 points, the second choice 10 points, the third choice 8 points, then down to the tenth choice which gets a single point.

A country’s total score is the sum awarded by each of the other countries, and the country with the highest score wins the competition. In most years the scoring system has made it possible for a song to receive zero points – nul points – as a total, and there’s a kind of anti-roll-of-honour dedicated to countries that have accomplished this feat. Special congratulations to Austria and Norway who, despite their deep contemporary musical roots, have each scored nul points on four occasions.

Anyway, here’s the thing. Although the UK gave the world The Beatles, The Rolling Stones, Pink Floyd, Led Zeppelin, David Bowie, Joy Division and Radiohead. And Adele. It hasn’t done very well in recent years in the Eurovision Song Contest.  It’s true that by 1997 the UK had won the competition a respectable 5 times – admittedly with a bit of gratuitous sexism involving the removal of women’s clothing to distract judges from the paucity of the music. But since then, nothing. Indeed, since 2000 the UK has finished in last place on 3 occasions, and has only twice been in the top 10.

Now, there are two possible explanations for this.

1. Our songs have been terrible. (Well, even more terrible than the others).
2. There’s a stitch-up in the voting process, with countries penalising England for reasons that have nothing to do with the quality of the songs.

But how can we objectively distinguish between these two possibilities? The poor results for the UK will be the same in either case, so we can’t use the UK’s data alone to unravel things.

Well, one way is to hypothesise a system by which votes are cast that is independent of song quality, and to see if the data support that hypothesis. One such hypothesis is a kind of ‘bloc’ voting system, where countries tend to award higher votes for countries of a similar geographical or political background to their own.

This article carries out an informal statistical analysis of exactly this type. Though the explanations in the article are sketchy, a summary of the results is given in the following figure. Rather than pre-defining the blocs, the authors use the data on voting patterns themselves to identify 3 blocs of countries whose voting patterns are similar. They are colour-coded in the figure, which shows (in some vague, undefined sense) the tendency for countries on the left to favour countries on the right in voting. Broadly speaking there’s a northern Europe group in blue, which includes the UK, an ex-Yugoslavian bloc in green and a rest-of-Europe bloc in red. But whereas the fair-minded north Europeans tend to spread their results every across all countries, the other two blocs tend to give highest votes to other member countries within the same bloc.

But does this mean the votes are based on non-musical criteria? Well, not necessarily. It’s quite likely that cultural differences – including musical ones – are also smaller within geographically homogeneous blocs than across them. In other words, Romania and Moldavia might vote for each other at a much higher than average rate, but this could just as easily be because they have similar musical roots and tastes as because they are friends scratching each other’s backs.

Another study finding similar conclusions about geo-political bloc voting is contained in this Telegraph article, which makes similar findings, but concludes:

Comforting as it might be to blame bloc voting for the UK’s endless poor record, it’s not the only reason we don’t do well.

In other words, in a more detailed analysis which models performance after allowing for bloc-voting effects, England is still doing badly.

This whole issue has also been studied in much greater detail in the academic literature using complex statistical models, and the conclusions are similar, though the authors report language and cultural similarities as being more important than geographical factors.

The techniques used in these various different studies are actually extremely important in other areas of application. In genetic studies, for example, they are used to identify groups of markers for certain disease types. And even in sports modelling they can be relevant for identifying teams or players that have similar styles of play.

But if Eurovision floats your boat, you can carry out your own analysis of the data based on the complete database of results available here.

# The 10-minute marathon challenge

Not content with having recently won the London marathon for the fourth time in a record time of 2:02:37, the phenomenal Kenyan athlete Eliud Kipchoge has announced a new bid to run the marathon distance in under two hours. The time Kipchoge set in the London marathon was already the second fastest in history and Kipchoge also holds the record for the fastest ever marathon, at 2:01:39, made in Berlin in 2018. But the sub- 2 hour marathon remains an elusive goal.

In 2016 Nike sponsored an attempt to break the 2-hour target. Three elite runners, including Kipchoge, trained privately to run a marathon-length distance in circuits around the Monza racetrack in Italy. Kipchoge won the race, but in a time of 2:00:25, therefore failing by 25 seconds to hit the 2-hour target. The specialised conditions for this attempt, including the use of relay teams of pace setters, meant that the race fell outside of IAAF standards, and therefore the 2:00:25 is not valid as a world record. Kipchoge’s planned attempt in London will also be made under non-standard conditions, so whatever time he achieves will also not be considered as valid in respect of IAAF rules. Regardless of this, beating the 2-hour barrier would represent a remarkable feat of human achievement, and this is Kipchoge’s goal.

But this begs the question: is a sub- 2 hour marathon under approved IAAF standards plausible? The following graphic shows how the marathon record has improved in the last 100 years or so, from Johnny Hayes’ record of 2:55:18 in 1908, right up to Kipchoge’s Berlin record.

Clearly there’s a law of diminishing returns in operation: the very substantial improvements in the first half of the graph are replaced by much smaller incremental improvements in the second half. This is perfectly natural: simple changes in training regimes and running equipment initially enabled substantial advances; more recent changes are much more subtle, and result in only in marginal improvements. So, the shape of the graph is no surprise. But if you were extrapolating forward to what might happen in the next 10, 100 or 1000 years, would your curve go below the 2-hour threshold or not?

Actually, it’s straightforward to take a set of data, like those contained in the graphic above, and fit a nice smooth curve that does a reasonable job at describing the overall pattern of the graph. And we could then extrapolate that curve into the future and see whether it goes below 2 hours or not. And if it does, we will even have a prediction of when it does.

But there’s a difficulty – the question of whether the solution crosses the 2-hour threshold or not is likely to depend very much on the type of curve we use to do the smoothing. For example, we might decide that the above graphic is best broken down into sections where the pattern has stayed fairly similar. In particular, the most recent section from around 1998 to 2018 looks reasonably linear, so we might extrapolate forward on that basis, in which case the 2-hour threshold is bound to be broken, and pretty soon too. On the other hand we might decide that the whole period of data is best described by a kind of ‘ell’-shaped curve which decreases to a lower horizontal limit. And then the question will be whether that limit is above or below 2 hours. In both cases the data will determine the details of the curve – the gradient of the straight line, for example, or the limit of the ‘ell’-shaped curve – but the form of the graph – linear, ‘ell’-shaped or something else – is likely to be made on more subjective grounds. And yet that choice will possibly determine whether the 2-hour threshold is predicted to be beaten or not.

There’s no way round this difficulty, though statistical techniques have been used to try to tackle it more rigorously. As I mentioned in a previous post, since athletics times are fastest times – whether it’s the fastest time in a race, or the fastest time ever when setting a record – it’s natural to base analyses on so-called extreme value models, which are mathematically suited to this type of process. But this still doesn’t resolve the problem of how to choose the curve which best represents the trend seen in the above picture. And the results aren’t terribly reliable. For example, in an academic paper ‘Records in Athletics Through Extreme-Value Theory‘ written in 2008 the authors John Einmahl and Jan Magnus predicted the absolute threshold times or distances (in case of field events) for a number of athletic events. At the time of writing their paper the world record for the marathon was 2:04:26, and they predicted a best possible time of 2:04:06.

History, of course, proved this to be completely wrong. To be fair to the authors though, they gave a standard error on their estimate of 57 minutes. Without going into detail, the standard error is a measure of how accurate the authors think their best answer is likely to be, and one rule of thumb interpretation of the standard error is that if you give a best answer and a standard error, then you’re almost certain the true value lies within 2 standard errors of your best answer. So, in this case, the authors were giving a best estimate of 2:04:06, but – rather unhelpfully – saying the answer could be as much as 114 minutes faster than that, taking us down to a marathon race time of 0:10:06.

So, come on Kipchoge, never mind the 2-hour marathon, let’s see if you’re up to the 10-minute marathon challenge.

Footnote: don’t trust everything you read in statistical publications. (Except in this blog, of course 😉).

# A bad weekend

Had a bad weekend? Maybe your team faded against relegated-months-ago Huddersfield Town, consigning your flickering hopes of a Champions League qualification spot to the wastebin. Or maybe you support Arsenal.

Anyway, Smartodds loves Statistics is here to help you put things in perspective: ‘We are in trouble‘. But not trouble in the sense of having to play Europa League qualifiers on a Thursday night. Trouble in the sense that…

Human society is under urgent threat from loss of Earth’s natural life

Yes, deep shit trouble.

This is according to a Global Assessment report by the United Nations, based on work by hundreds of scientists who compiled as many as 15,000 academic studies. Here are some of the headline statistics:

• Nature is being destroyed at a rate of tens to hundreds of times greater than the average over the last 10 million years;
• The biomass of wild mammals has fallen by 82%;
• Natural ecosystems have lost around half of their area;
• A million species are at risk of extinction;
• Pollinator loss has put up to £440 billion of crop output at risk;

The report goes on to say:

The knock-on impacts on humankind, including freshwater shortages and climate instability, are already “ominous” and will worsen without drastic remedial action.

But if only we could work out what the cause of all this is. Oh, hang on, the report says it’s…

… all largely as a result of human actions.

For example, actions like these:

• Land degradation has reduced the productivity of 23% of global land;
• Wetlands have drained by 83% since 1700;
• In the years 2000-2013 the area of intact forest fell by 7% – an area the size of France and the UK combined;
• More than 80% of wastewater, as well as 300-400m tons of industrial waste, is pumped back into natural water reserves without treatment;
• Plastic waste is a factor of tens greater than in 1980, affecting 86% of marine turtles, 44% of seabirds and 43% of marine animals.
• Fertiliser run-off has created 400 dead zones – an area the size of the UK.

You probably don’t need to be a bioscientist and certainly not a statistician to realise none of this is particularly good news. However, the report goes on to list various strategies that agencies, governments and countries need to adopt in order to mitigate against the damage that has already been done and minimise the further damage that will unavoidably be done under current regimes.  But none of it’s easy, and evidence so far is not in favour of collective human will to accept the responsibilities involved.

Josef Settele of the Helmholtz Centre for Environmental Research in Germany said

People shouldn’t panic, but they should begin drastic change. Business as usual with small adjustments won’t be enough.

So, yes, cry all you like about Liverpool’s crumbling hopes for a miracle against Barcelona tonight, but keep it in perspective and maybe even contribute to the wider task of saving humanity from itself.

<End of rant. Enjoy tonight’s game.>

Correction: *Bareclona’s* crumbling hopes

# More or Less

In a recent post I included a link to an article that showed how Statistics can be used to disseminate bullshit. That article was written by Tim Harford, who describes himself as ‘The Undercover Economist’, which is also the title of his blog. Besides the blog, Tim has written several books, one of which is also called ‘The Undercover Economist‘.

As you can probably guess from all of this, Tim is an economist who, through his writing and broadcasting, aims to bring the issues of economics to as wide an audience as possible. But there’s often a very thin boundary between what’s economics and what’s Statistics, and a lot of Tim’s work can equally be viewed from a statistical perspective.

The reason I mention all this is that Tim is also the presenter of a Radio 4 programme ‘More or Less’, whose aim is to…

…try to make sense of the statistics which surround us.

‘More or Less’ is a weekly half-hour show, which covers 3 or 4 topics each week. You can find a list of, and link to, recent episodes here.

As an example, at the time of writing this post the latest episode includes the following items:

• An investigation of a claim in a recent research paper that claimed floods had worsened by a factor of 15  since 2005;
• An investigation into a claim by the Labour Party that a recent resurgence in the number of cases of Victorian diseases is due to government  austerity policy;
• An interview with Matt Parker, who was referenced in this blog here, about his new book ‘Humble Pi’;
• An investigation into a claim in The Sunday Times that drinking a bottle of wine per week is equivalent to a losing ￡2,400 per year in terms of reduction in happiness.

Ok, now, admittedly, the whole tone of the programme is about as ‘Radio 4’ as you could possibly get. But still, as a means for learning more about the way Statistics is used – and more often than not, mis-used – by politicians, salespeople, journalists and so on, it’s a great listen and I highly recommend it.

If Smartodds loves Statistics was a radio show, this is what it would be like (but less posh).

# Taking things to extremes

One of the themes I’ve tried to develop in this blog is the connectedness of Statistics. Many things which seem unrelated, turn out to be strongly related at some fundamental level.

Last week I posted the solution to a probability puzzle that I’d posted previously. Several respondents to the puzzle, including Olga.Turetskaya@smartodds.co.uk, included the logic they’d used to get to their answer when writing to me. Like the others, Olga explained that she’d basically halved the number of coins in each round, till getting down to (roughly) a single coin. As I explained in last week’s post, this strategy leads to an answer that is very close to the true answer.

Anyway, Olga followed up her reply with a question: if we repeated the coin tossing puzzle many, many times, and plotted a histogram of the results – a graph which shows the frequencies of the numbers of rounds needed in each repetition – would the result be the typical ‘bell-shaped’ graph that we often find in Statistics, with the true average sitting somewhere in the middle?

Now, just to be precise, the bell-shaped curve that Olga was referring to is the so-called Normal distribution curve, that is indeed often found to be appropriate in statistical analyses, and which I discussed in another previous post. To answer Olga, I did a quick simulation of the problem, starting with both 10 and 100 coins. These are the histograms of the results.

So, as you’d expect, the average values (4.726 and 7.983 respectively) do indeed sit nicely inside the respective distributions. But, the distributions don’t look at all bell-shaped – they are heavily skewed to the right. And this means that the averages are closer to the lower end than the top end. But what is it about this example that leads to the distributions not having the usual bell-shape?

Well, the normal distribution often arises when you take averages of something. For example, if we took samples of people and measured their average height, a histogram of the results is likely to have the bell-shaped form. But in my solution to the coin tossing problem, I explained that one way to think about this puzzle is that the number of rounds needed till all coins are removed is the maximum of the number of rounds required by each of the individual coins. For example, if we started with 3 coins, and the number of rounds for each coin to show heads for the first time was 1, 4 and 3 respectively, then I’d have had to play the game for 4 rounds before all of the coins had shown a Head. And it turns out that the shape of distributions you get by taking maxima is different from what you get by taking averages. In particular, it’s not bell-shaped.

But is this ever useful in practice? Well, the Normal bell-shaped curve is somehow the centrepiece of Statistics, because averaging, in one way or another, is fundamental in many physical processes and also in many statistical operations. And in general circumstances, averaging will lead to the Normal bell-shaped curve.

Consider this though. Suppose you have to design a coastal wall to offer protection against sea levels. Do you care what the average sea level will be? Or you have to design a building to withstand the effects of wind. Again, do you care about average winds? Almost certainly not. What you really care about in each case will be extremely large values of the process: high sea-levels in one case; strong winds in the other. So you’ll be looking through your data to find the maximum values – perhaps the maximum per year – and designing your structures to withstand what you think the most likely extreme values of that process will be.

This takes us into an area of statistics called extreme value theory. And just as the Normal distribution is used as a template because it’s mathematically proven to approximate the behaviour of averages, so there are equivalent distributions that apply as templates for maxima. And what we’re seeing in the above graphs – precisely because the data are derived as maxima – are examples of this type. So, we don’t see the Normal bell-shaped curve, but we do see shapes that resemble the templates that are used for modelling things like extreme sea levels or wind speeds.

So, our discussion of techniques for solving a simple probability puzzle with coins, leads us into the field of extreme value statistics and its application to problems of environmental engineering.

But has this got anything to do with sports modelling? Well, the argument about taking the maximum of some process applies equally well if you take the minimum. And, for example, the winner of an athletics race will be the competitor with the fastest – or minimum – race time. Therefore the models that derive from extreme value theory are suitable templates for modelling athletic race times.

So, we moved from coin tossing to statistics for extreme weather conditions to the modelling of race times in athletics, all in a blog post of less than 1000 words.

Everything’s connected and Statistics is a very small world really.

# One touch football

Remember that thing about statistical diagrams of data telling their own story? Well, here’s a graphic showing the location of every touch of the ball by Alexis Sánchez in the 12 minutes he played as a substitute in the Manchester derby for United against City:

I recently posted a problem that had been shown to me by Benoit.Jottreau@smartodds.co.uk. Basically, you have a bunch of coins. You toss them and remove the ones that come up heads. You then repeat this process over and over till all the coins have been removed. The question was, if you start with respectively 10 or 100 coins, how many rounds of this game does it take on average till all the coins have been removed?

I’m really grateful to all of you who considered the problem and sent me a reply. The answers you sent me are summarised in the following graphs.

The graph on the left shows the counts of the guesses for the number of rounds needed when starting with 10 coins; the one on the right is the counts  but starting with 100 coins. The main features are as follows:

• Starting with 10 coins, the most popular answer was 4 rounds; with 100 coins the most popular answer was either 7 or 8 rounds.
• Almost everyone gave whole numbers as their answers. This wasn’t necessary. Even though the result of every experiment has to be a whole number, the average doesn’t. In a similar way, the average number of goals in a football match is around 2.5.
• The shape of the distribution of answers for the two experiments is much the same: heavily skewed to the right. This makes sense given the nature of the experiment: we can be pretty sure a minimum number of rounds will be needed, but less sure about the maximum. This is reflected in your collective answers.
• Obviously, with more coins, there’s more uncertainty about the answer, so the spread of values is much greater when starting with 100 coins.

Anyway, I thought the replies were great, and much better than I would have come up with myself if I’d just gone with intuition instead of solving the problem mathematically.

A few people also kindly sent me the logic they used to get to these answers. And it goes like this…

Each coin will come up heads or tails with equal probability. So, the average number of coins that survive each round is half the number of coins that enter that round. This is perfectly correct. So, for example, when starting with 10 coins, the average number of coins in the second round is 5. By the same logic, the average number of coins in the second round is 2.5. And the average number of coins in the third round is 1.25. And in the fourth round it’s 0.625. So, the first time the average number of coins goes below 1 is on the fourth round, and it’s therefore reasonable to assume 4 is the average number of rounds for all the coins to be removed.

Applying the same logic but starting with 100 coins, it takes 7 rounds for the average number of coins to fall below 1.

With a slight modification to the logic, to always round to whole numbers, you might get slightly different answers: say 5 and 8 instead of 4 and 7. And looking at the answers I received, I guess most respondents applied an argument of this type.

This approach is really great, since it shows a good understanding of the main features of the process: 50% of coins dropping out, on average, at each round of the game. And it leads to answers that are actually very informative: knowing that I need 4 rounds before the average number of coins drops below 1 is both useful and very precise in terms of explaining the typical behaviour of this process.

However… you don’t quite get the exact answers for the average number of rounds, which are 4.726 when you start with 10 coins, and 7.983 when you start with 100 coins. But where do these numbers come from, and why doesn’t the simple approach of dividing by 2 until you get below 1 work exactly?

Well, as I wrote above, starting with 10 coins you need 4 rounds before the average number of coins falls below 1. But this is a statement about the average number of coins. The question I actually asked was about the average number of rounds. Now, I don’t want to detract from the quality of the answers you gave me. The logic of successively dividing by 2 till you get below one coin is great, and as I wrote above, it will give an answer that is meaningful in its own right, and likely to be close to the true answer. But, strictly speaking, it’s focussing on the wrong aspect of the problem: the number of coins instead of the number of rounds.

The solution is quite technical. Not exactly rocket science, but still more intricate than is appropriate for this blog. But you might still find it interesting to see the strategy which leads to a solution.

So, start by considering just one of the coins. Its pattern of results (writing H for Heads and T for Tails) will be something like

• T, T, H; or
• T, T, T, H; or
• H

That’s to say, a sequence of T’s followed by H (or just H if we get Heads on the first throw).

But we’ve seen something like this before. Remember the post Get out of jail? We kept rolling a dice until we got the first 6, and then stopped. Well, this is the same sort of experiment, but with a coin. We keep tossing the coin till we get the first Head. Because of the similarity between these experiments, we can apply the same technique to calculate the probabilities for the number of rounds needed to get the first Head for this coin. One round will have probability 1/2, two rounds 1/4, three rounds 1/8 and so on.

Now, looking at the experiment as a whole, we have 10 (or 100) coins, each behaving the same way. And we repeat the experiment until all of the coins have shown heads for the first time. What this means is that the total number of rounds needed is the maximum of the number of rounds for each of the individual coins. It turns out that this gives a simple method for deriving a formula that gives the probabilities of the number of rounds needed for all of the coins to be removed, based on the probabilities for a single coin already calculated above.

So, we now have a formula for the probabilities for the numbers of rounds needed. And there’s a standard formula for converting this formula into the  average. It’s not immediately obvious when you see it, but with a little algebraic simplification it turns out that you can get the answer in fairly simple mathematical form. Starting with n coins – we had n=10 and n=100 –  the average number of rounds needed turns out to be

$1+\sum_{k=1}^n(-1)^{k-1}{n \choose k} (2^k-1)^{-1}$

The $\sum$ bit means do a sum, and the $~{n \choose k}~$ term is the number of unique combinations of k objects chosen from n. But don’t worry at all about this detail; I’ve simply included the answer to show that there is a formula which gives the answer.

With 10 coins you can plug n=10 into this expression to get the answer 4.726. With 100 coins there are some difficulties, since the calculation of $~{n \choose k}~$ with n=100 is numerically unstable for many values of k. But accurate approximations to the solution are available, which don’t suffer the same numerical stability problems, and we get the answer 7.983.

So, in summary, with a fair bit of mathematics you can get exact answers to the problem I set. But much more importantly, with either good intuition or sensible reasoning you can get answers that are very similar. This latter skill is much more useful in Statistics generally, and it’s fantastic that the set of replies I received showed collective strength in this respect.

# Do I feel lucky?

Ok, I’m going call it…

This is, by some distance:

The Best Application of Statistics in Cinematic History‘:

It has everything: the importance of good quality data; inference; hypothesis testing; prediction; decision-making; model-checking. And Clint Eastwood firing rounds off a 44 Magnum while eating a sandwich.

But, on this subject, do you feel lucky? (Punk)

Richard Wiseman is Professor in Public Understanding of Psychology at the University of Hertfordshire. His work touches on many areas of human psychology, and one aspect he has studied in detail is the role of luck. A summary of his work in this area is contained in his book The Luck Factor.

This is from the book’s Amazon description:

Why do some people lead happy successful lives whilst other face repeated failure and sadness? Why do some find their perfect partner whilst others stagger from one broken relationship to the next? What enables some people to have successful careers whilst others find themselves trapped in jobs they detest? And can unlucky people do anything to improve their luck – and lives?

Richard’s work in this field is based over many years of research involving a study group of 400 people. In summary, what he finds, perhaps unsurprisingly, is that people aren’t born lucky or unlucky, even if their perception is that they are. Rather, our attitude to life generally determines how the lucky and unlucky events we experience determine the way our lives pan out. In other words, we really do make our own luck.

He goes on to identify four principles we can adopt in order to make the best out of the opportunities (and difficulties) life bestows upon us:

1. Create and notice chance opportunities;
2. Listen to your intuition;
3. Create self-fulfilling prophesies via positive expectations;
4. Adopt a resilient attitude that transforms bad luck into good.

In summary: if you have a positive outlook on life, you’re likely to make the best of the good luck that you have, while mitigating as well as is possible  against the bad luck.

But would those same four principles work well for a sports modelling company? They could probably adopt 1, 3 and 4 as they are, perhaps reinterpreted as:

1. Seek out positive value trading opportunities wherever possible.

3. Build on success. Keep a record of what works well, both in trading and in the company generally, and do more of it.

4. Don’t confuse poor results with bad luck. Trust your research.

Principle 2 is a bit more problematic: much better to stress the need to avoid the trap of following instinct, when models and data suggest a different course of action. However, I think the difficulty is more to do with the way this Principle has been written, rather than what’s intended. For example, I found this description in a review of the book:

Lucky people actively boost their intuitive abilities by, for example… learning to dowse.

Learning to dowse!

But this isn’t what Wiseman meant at all. Indeed, he writes:

Superstition doesn’t work because it is based on outdated and incorrect thinking. It comes from a time when people thought that luck was a strange force that could only be controlled by magical rituals and bizarre behaviors.

So, I don’t think he’s suggesting you start wandering around with bits of wood in a search for underground sources of water. Rather, I think he’s suggesting that you be aware of the luck in the events around you, and be prepared to act on them. But in the context of a sports modelling company, it would make sense to completely replace reference to intuition with data and research. So…

2. Invest in data and research and develop your trading strategy accordingly.

And putting everything together:

1. Seek out positive value trading opportunities wherever possible.
2. Invest in data and research and develop your trading strategy accordingly.
3. Build on success. Keep a record of what works well, both in trading and in the company generally, and do more of it.
4. Don’t confuse poor results with bad luck. Trust your research.

And finally, what’s that you say?  “Go ahead, make my day.” Ok then…