Juvenile dinosaurs

This blog is mostly about Statistics as a science rather than statistics as numbers. But just occasionally the statistics themselves are so shocking, they’re worthy of a mention.

With this in mind I was struck by two statistics of a similar theme in the following tweet from Ben Goldacre (author of the Bad Science website and book):

 

Moreover, in the discussion following Ben’s tweet, someone linked to the following cartoon figure:

This shows that even if you change the way of measuring distance from time to either phylogenetic distance or physical similarity, the following holds: the distance between a sparrow and T-Rex is smaller than that between T-Rex and Stegosaurus.


Footnote 1: this is more than a joke. Recent research makes the case that there is a strong evolutionary link between birds and dinosaurs. As one of the authors writes:

We now understand the relationship between birds and dinosaurs that much better, and we can say that, when we look at birds, we are actually looking at juvenile dinosaurs.

Footnote 2. Continuing the series (also taken from the discussion of Ben’s tweet)… Cleopatra is closer in time to the construction of the space shuttle than the pyramids.

Footnote 3. Ben Goldacre’s book, Bad Science, is a great read. It includes many examples of the way science – and Statistics – can be misused.

 

Problem solved

A while back I set a puzzle asking you to try to remove three coins from a red square region as shown in the following diagram.

The only rule of the game is that when a coin is removed it is replaced with 2 coins – 1 immediately to the right of and one immediately below the coin that is removed. If there is no space for adding these replacement coins, the coin cannot be removed.

The puzzle actually appeared in a recent edition of Alex Bellos’ Guardian mathematics puzzles, though it was created by the Argentinian mathematician Carlos Sarraute. This is his solution which is astonishing for its breathtaking ingenuity.

The solution starts by giving a value to every square in the grid as follows:

Remember, the grid goes on forever both to the right and downwards. The top left hand box has value 1. Going right from there, every subsequent square has value equal to 1/2 of the previous one. So: 1, 1/2, 1/4, 1/8 and so on. The first column is identical to the first row. To complete the second row, we start with the first value, 1/2, and again just keep multiplying by 1/2. The second column is the same as the second row. And we fill the entire grid this same way. Technically, every row and column is a series of geometric numbers: consecutive multiples of a common number, which in this case is 1/2.

Let’s define the value of a coin to be the value of the square its on. Then the total value of the coins at the start of the game is

1 + \frac{1}{2} + \frac{1}{2}= 2

Now…

  • When we remove a coin we replace it with two coins, one immediately to the left and one immediately to the right. But if you look at the value any square on the grid, it is equal to the sum of the values of the squares immediately below and to the right. So when we remove a coin we replace it with two coins whose total value is the same. It follows that the total value of the coins stays unchanged however many moves we make. It will always be 2 however many moves we make.
  • This is the only tricky mathematical part. Look at the first row of numbers. It consists of 1, 1/2, 1/4, 1/8… and goes on forever. But even though this is an infinite sequence it has a finite sum of 2. Obviously, we can never really add infinitely many numbers in practice, but by adding more and more terms in the series we will get closer and closer to the value of 2. Try it on a calculator. In summary:

1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} +\ldots  = 2.

  • Working down the rows, the second row is the same as the first with the first term removed. So its sum must be 1. The third is the same as the second with the first term of 1/2 removed, so its sum is 1/2. By the same reasoning, the sum of the fourth row will be 1/4, the fifth row 1/8 and so on.
  • So, the row sums are respectively 2, 1, 1/2, 1/4, …. This is the same as the values of the first row with the additional first term of 2. It follows that the sum of the row sums, and therefore the sum of all numbers in the grid is 2+2=4. Again, we can’t add all the numbers in the practice, but we will get closer and closer to the value of 4 by adding more and more squares.
  • The total value of the squares inside the red square is 1 + 1/2 + 1/2 + 1/4 = 9/4. The total value outside this region must therefore be 2-9/4= 7/4.
  • Putting all this together, the initial value of the coins was 2. After any number of moves, the total value of all coins will always remain 2. But the total value of all squares outside the red square is only 7/4. It must therefore be impossible to remove the three coins from the red square because to do so would require the coins outside of this area to have a value of 2, which is greater than the total value available in the entire region.

I find this argument quite brilliant. My instincts were to try to solve the puzzle using arguments from geometry. I failed. It would never have occurred to me to try to develop a solution based on the properties of numbers.


As I wrote in the original post, this puzzle doesn’t really have any direct relevant to Statistics except in so much as it shows the power and beauty of mathematical proof, which is an essential part of statistical theory. Having said that, the idea of infinite limits is important in Statistics, and I’ll discuss this in a further post. Let me just mention though that summing infinite series as in the solution above is a delicate issue for at least two reasons:

  1. Although the sum 1 + 1/2 + 1/4 + 1/8 + …. has a finite sum of 2, this series 1 + 1/2 + 1/3 + 1/4 + 1/5 + …. has no finite sum. The sum grows very slowly, but as I take more and more numbers in the series, the sum grows without any limit. That’s to say, if you give me any number – say 1 million – I can always find enough terms in the series for the sum to be greater than that number.
  2. To get the total value of the grid, we first added the rows and then added these row sums across the columns. We could alternatively have first added the columns, and then added these columns sums across the rows and we’d have got the same answer. For this example both these alternatives are valid. But in general this interchange of row and column sums to get the total sum is not valid. Consider, for example, this infinite grid:

 The first row sums to 2, after which all other rows sum to zero. So, the sum of the row sums is 2. But looking at the columns, even column sums to zero. So, if we sum the columns and then sum these sums we get 0. This couldn’t possibly happen with finite grids, but infinite grids require a lot more care.

In a follow-up post we’ll consider limits of sums in the context of Statistics.


Finally, I’m grateful to Fabian.Thut@smartodds.co.uk for some follow-up discussion on the original post. In particular, we ended up discussing the following variation on the original puzzles. The rules are exactly the same as before, but the starting configuration of the coins is now as per the following diagram:

In this case, can the puzzle be solved? Does the argument presented for the original problem help in any way?

If you have any thoughts about this, please do write to me. In any case, I’ll write another post with the solution to this version shortly.

Coincidentally

birthday_coincidence.jpg

Here we go again…

Happy birthday to me. And Harry.Hill@smartodds.co.uk  and Rickie.Reynolds@smartodds.co.uk. And willfletcher1111@gmail.com who also used to be in the quant team. What a remarkable coincidence that 3 of us currently in the quant team – together with another quant who has since left – each have our birthday on the 11th November. But as I discussed in a post around this time last year as well as at a previous offsite, there are so many possible combinations of three or four people in the company that could have a shared birthday, that it’s not that very surprising that one combination does. It just happened to be me, Harry, Rickie and Will on 11/11.

And on the subject of coincidences…

You may have heard of an app called what3words. This is a location app for ios or android which divides the entire globe into 3 x 3 metre squares and assigns 3 words to each square. For example, currently sitting at my desk in the Smartodds office, the 3 allocated words are “insert”, “falls”, “opens”. The idea is that in an emergency I can identify and communicate my unique position to the relevant emergency services by means of just these 3 words. Of course, I could do the same thing with my GPS coordinates, but the point is that standard words are easier to read and communicate in a hurry. And there are already a number of instances in which lives have potentially been saved through use of the app.

And the coincidence? Well, I opened the app in my house the other day with this result:

3words

… and, here in a recent Halloween pic with my Grandson, is my Granddaughter…

poppy.JPG

 

… the charmingly awesome Poppy!


Footnote: writing this now, I’m reminded that some time ago, as a follow-up to a post  which also discussed coincidences, Richard.Greene@smartodds.co.uk mailed me about an experience he’d recently had. He described it as follows:

I was listening to the radio one morning, and the presenter mentioned “French windows”. I wasn’t sure at the time what they were, and remember amusing myself as to what made them French exactly – perhaps they come with a beret on top etc…anyway, an hour later, I was watching Frasier over my cornflakes and there was a joke/reference to French windows!

Like the shared birthdays, if you tried to calculate the chance of the mentioning of French windows on both the radio and an episode of Frasier within a short time of one another, the probability would be incredibly remote. But again, we experience so many opportunities for coincidences every day, that although the vast majority don’t happen, one or two inevitably do. And they’re the ones we remember and sometimes ascribe to ‘fate’, ‘destiny’, ‘karma’ etc etc. When in fact it’s just the laws of probability playing out in our daily lives.

Anyway, Richard suggested that an idea for a blog post would be collect and collate ‘coincidences’ of the kind I’ve described here – my experience with what3words; Richard’s with French windows. So, if you’ve recently had, or have in the near future, a coincidental experience of some sort, please send it to me and I’ll include it in a future post.

Size does matter

Consider the following scenario…

A football expert claims that penalties are converted, on average, with a 65% success rate. I collect a sample of games with penalties and find that the conversion rate of penalties in that sample is 70%. I know that samples are bound to lead to some variation in sample results, so I’m not surprised that I didn’t get a success rate of exactly 70% in my sample. But, is the difference between 65% and 70% big enough for me to conclude that the expert has got it wrong? And would my conclusions be any different if the success rate in my sample had been 80%? Or 90%?

This type of issue is at the heart of pretty much any statistical investigation: judging the reliability of an estimate provided by a sample, and assessing whether its evidence supports or contradicts some given hypothesis?

Actually, it turns out that with just the information provided above, the question is impossible to answer. For sure, a true success rate of 65% is more plausible with a sample value of 70% than it would have been with a sample value of 80% or 90%. And just as surely, having got a sample value of 70%, a true value of 65% is more plausible than a true value of 60%. But, the question of whether the sample value of 70% actually supports or opposes the claim of a true value of 65% is open.

To answer this question we need to know whether a sample value of 70% is plausible or not if the true value is 65%. If it is, we can’t say much more: we’d have no reason to doubt the 65% value, although we still couldn’t be sure – we can never be sure! – that this value is correct. But if the sample value of 70% is not plausible if the population value is 65%, then this claim about the population is likely to be false.

One way of addressing this issue is to construct a confidence interval for the true value based on the sample value. Without getting too much hung up on technicalities, a confidence interval is a plausible range for the population value given the sample value. A 95% confidence interval is a range that will contain the true value with probability 95%; a 99% confidence interval is a range that will contain the true value with probability 99%; and so on.

So why not go with a 100% confidence interval? Well, in most circumstances this would be an interval that stretches to infinity in both directions, and we’d be saying that we can be 100% sure that the true value is between plus and minus infinity. Not very helpful. At the other extreme, a 1% confidence interval would be very narrow, but we’d have virtually no confidence that it contained the true value. So, it’s usual to adopt 95% or 99% confidence intervals as benchmarks, as they generally provide intervals that are both short enough and with high enough confidence to give useful information.

For problems as simple as the one above, calculating confidence intervals is straightforward. But, crucially, the size of the confidence interval depends on the size of the data sample. With small samples, there is more variation, and so the confidence intervals are wider; with large samples there is less variation, and the confidence intervals are more narrow.

The following graph illustrates this for the example above.

  • The horizontal axis gives different values for the size of the sample on which the value of 70% was based.
  • The horizontal red line shows the hypothesised population value of 65%
  • The horizontal green line shows the observed sample value of 70%.
  • For each choice of sample size, the vertical blue line shows a 95% confidence interval for the true population value based on the sample value of 70%.

What emerges is that up until a sample size of 300 or so, the 95% confidence interval includes the hypothesised value of 65%. In this case, the observed data are consistent with the hypothesis, which we therefore have no reason to doubt. For larger sample sizes, the hypothesised value falls outside of the interval, and we would be led to doubt the claim of a 65% success rate. In other words: (sample) size does matter. It determines how much variation we can anticipate in estimates, which in turn determines the size of confidence intervals and by extension the degree to which the sample data can be said to support or contradict the hypothesised value.

The story is much the same with 99% confidence intervals, as shown in the following figure.

The intervals are wider, but the overall pattern is much the same. However, with this choice the data contradict the hypothesis only for sample sizes of around 500 or more.

Whether you choose to base your decisions on intervals of confidence 95%, 99% or some other level is a matter of preference. In particular, there are two types of errors we can make: we might reject the hypothesised value when it’s true, or accept it when it’s false. Using 99% intervals rather than 95% will reduce the chance of making the first error, but increase the chance of the second. We can’t have it both ways. The only way of reducing the chances of both of these types of errors is to increase the sample size. Again: size matters.


Footnote: one thing worth noting in the figures above is that the confidence intervals change in length quite quickly when the sample size is small, but more slowly when the sample size is large. This can be stated more precisely: to make a confidence interval half as wide you have to multiply the sample size by 4. So if your sample size is 10, you just need to increase it to 40 to make confidence intervals half as wide; but if the sample size is 100, you have to increase it to 400. It’s a sort of law of diminishing returns, which has important consequences if data are expensive. An initial investment of, say, £100’s worth of data will give you answer with a certain amount of accuracy, but each further investment of £100 will improve accuracy by a smaller and smaller amount. At what point is the cost of potential further investment too great for the benefits it would lead to?

 

 

I’m a 20 cent coin, get me out of here

 

European 20 Cent Coins (front and back) isolated on white background

Usually when I’ve posted puzzles to the blog the’ve had a basis in Probability or Statistics. One exception to this was the mutilated chessboard puzzle, whose inclusion I justified  by pointing out that mathematical logic is an important strand in the theory that underpins Statistics. Definitely not the only strand, but important nonetheless.

In this same spirit, here’s another puzzle you might like to look at. I’ll give references to the author and so on when I write a follow-up post with the solution. But, if you think I should only be doing posts that are strictly Probability or Statistics related, please just ignore this post. It’s only related to Statistics in the same way that the mutilated chessboard puzzle was. Having said that, I will use follow-up discussion to this puzzle as a lead-in to some important statistical ideas.

Anyway, here’s the puzzle. Look at this grid of squares…

Actually, you have to imagine the grid extending indefinitely downwards and to the right. In the top left-hand corner of the grid you can see a 2-by-2 section of the grid that’s been marked with red lines, and 3 coins have been placed in that section. Your task is to remove the coins from that section by following these rules:

  1. Coins are removed one at a time.
  2. When you remove a coin you must replace it with two coins, one immediately below and one immediately to the right of the one that’s been removed.
  3. If a coin does not have a free space both immediately below and to the right, it cannot be removed until such space becomes available.

You have to find the sequence of moves that results in the section inside the red square being emptied of coins, or explain why it’s impossible to find such a sequence.

To make things easier, you can try the puzzle using this gif. Again, I’ll give credits and references for this work when I write the follow-up post.

To play just click on the green flag to start and then click successively on the coins you’d like to remove. You will only be allowed to remove coins according to the rules above, and when you do legally remove a coin, two new coins are automatically added, again according to the stated rules. If you just want to start over, press the green flag again.

(I don’t know what the red button is for, but DON’T PRESS IT).

So, can you find the sequence of moves that releases all of the coins from the red square? Or if it can’t be done can you explain why?

Please write to me if you want to discuss the puzzle or share your ideas. I’ll write a post with a solution shortly.

 

 

Sleight of hand

cards3

A while ago I sent a post with a card trick which I explained had a statistical element to it, and asked you to try to work out how it was done. Thanks to those of you who wrote to me with variants on the correct answer.

The rules of the game were that my assistant, Matteo, chose a card at random hidden from me. It happened to be a 5 in the video. I then turned the cards over one at a time and Matteo had to play a counting game. Once he reached the 5th card, he noted its value, which was a 10. So he then counted another 10 cards in the sequence, noted the value of that card, and so on until we ran out of cards. Matteo had to remember the final card in his sequence before the cards ran out, which turned out to be the eight of diamonds. My task as the magician was to predict what Matteo’s final card was, which I did successfully.

Now, there are 2 reasons why this is a statistical card trick.

  1. It doesn’t always work. It does so with a reasonably high probability, but depending on the  configuration of the cards once they are shuffled, won’t always. I’ll be honest: we had to remake the video several times, but that was always due to my incompetence in explaining the trick and not because it ever failed. Still, it won’t always work.
  2. The second reason it’s a statistical trick is in its execution. The way it works is that I also play the same counting game as Matteo, but starting with the value of the first card I turn over, which happened to be a 10. So, we’re both playing the same counting game but from different starting points. Matteo’s starting point is 5, mine is 10. Although we start from different places, it turns out to be quite likely – though not certain – that the counting sequences we follow will overlap at some point. And once they do overlap, we are then following exactly the same sequence and so will arrive at the same final card.

Technically, the sequences of cards Matteo and I are both following are called Markov chains. These are sequences of random numbers such that in order to understand what the next card might be I only need to know the value of the current card, without knowing the past sequence that took me to the current state. In other words, when Matteo has to start counting 10 cards, it doesn’t matter how he got to that position, just that that’s where he currently is. And I also generate my own Markov chain. With an unlimited number of cards in a pack, the mathematical properties of Markov chains would guarantee that our sequences  meet at some point, after which we would be following exactly the same sequence, leading me to have the same final card as Matteo. With just 52 cards in a pack, there’s no guarantee, which is why the trick won’t always work.

The fact that the trick might not work is a little undesirable, but you can increase the chances by counting picture cards as 1 rather than 10. This forces the sequences to change card more often, which increases the chances of our two sequences overlapping.


Markov chains are actually really important building blocks for modelling in many areas of Statistics, which is one reason why I used posted the card trick to the blog. I’ll use a future post to explain this though.

 

Getting high

Speed climbing is what is says on the tin: climbing at speed. The objective is to climb a standard wall with a height of 15 metres as quickly as possible. Speed climbing is actually one of three disciplines – the others being ‘bouldering’ and ‘lead’ –  that together comprise Sport Climbing. This combined category will be included as an Olympic sport for the first time in Tokyo, 2020.

The history of Sport Climbing is relatively brief. It seems to have developed from Sportroccia, which was the first international competition for climbers held in different locations in Italy from 1985 to 1989. This led to the first World Championships in Frankfurt in 1991, since which there has been a Sport Climbing World Championship event held every two years.

The inclusion of speed climbing as one of the disciplines in Sport Climbing has always been controversial. Many climbers regard the techniques required to climb at speed to be at odds with the skills that are needed for genuine outdoor climbs. Like in the picture at the header of this post.

The controversy is such that even though Sport Climbing will be in the Olympics for the first time in 2020, a new format is being proposed for the 2024 Olympics in which Speed Climbing is separated as a discipline from the other two categories.

Anyway, leaving the controversy aside, climbing 15 metres doesn’t sound too daunting until you look at a picture of what it entails…

For experienced climbers a wall like this isn’t particularly challenging, but speed climbers have the additional task of competing against both an opponent – who is simultaneously completing an identical course – and the clock. The current world records are 5.48 seconds for men and 6.995 seconds for women. Just to put that in perspective: the men’s record corresponds to a speed of almost 10 km per hour. Vertically. With not much to hold onto.

The women’s world record was actually set very recently by the Indonesian female climber Aries Susanti Rahayu – nicknamed Spiderwoman. You can see her record breaking climb here.

The men’s world record is held by by Iranian climber Reza Alipourshenazandifar in 2017. (Performance here.)

Like my recent discussion about marathon times, what’s interesting about speed climbing from a statistical point of view is trying to assess what the fastest possible climb time might be.

The following graphs shows how the records have fallen over time for both men and women.

Though irregular, you could convince yourself that the pattern for women’s records is approximately following a straight line. On the other hand, notwithstanding the lack of data, the pattern for men seems more like a curve that could be levelling off. These two observations aren’t mutually consistent though, as they would suggest that not too far into the future the women’s record will be faster than the men’s, which is implausible – though not impossible – for biological reasons.

This illustrates a number of difficulties with statistical modelling in this type of context:

  1. We have very few data to work with;
  2. To predict forwards we need to assume some basic pattern for the data, but the choice of pattern – say linear or curved – is likely itself to affect how results extrapolate into the future;
  3. Separate extrapolations for women and men might lead to incompatible results;
  4. As also discussed in the context of predicting ultimate marathon times, an extrapolation based just on numbers ignores the underlying physics and biology which ultimately determines what the limits of human capacity are.

Maybe have a look at the data yourselves and write to me if you have ideas about what the ultimate times for both men and women might be. I’ll post any suggestions and perhaps even add ideas of my own in a future post.

The wonderful and frightening world of Probability

A while back I posted a puzzle based on a suggestion by Fabian.Thut@smartodds.co.uk in which Smartodds employees are – hypothetically – given the opportunity to increase their bonus by a factor of 10. See the original post for the rules of the game.

As I wrote at the time, the solution is not at all obvious – I don’t think I could have found it myself – but it includes some important ideas from Statistics. It goes as follows…

Each individual employee has a probability equal to 1/2 of finding their number. This is because they can open 50 of the 100 boxes, leaving another 50 unopened. It’s then obvious by symmetry that they must have a 50-50 chance of finding their number, since all numbers are randomly distributed among the boxes.

But recall, the players’ bonus is multiplied by 10 only if all players find their own number.

To begin with, let’s assume that the employees play the game without any strategy at all. In that case they are playing the game independently, and the standard rules of probability mean that we must multiply the individual probabilities to get the overall win probability. So, the probability that the first 2 players both win is 1/2 * 1/2. The probability that the first 3 players all win is 1/2 * 1/2 * 1/2. And the probability that all 100 players win is 1/2 multiplied by itself 100 times, which is roughly

0.000000000000000000000000000000789.

In other words, practically zero. So, the chances of the bonuses being multiplied by 10 is vanishingly small; it’s therefore almost certain that everyone will lose their bonus if they choose to play the game. As Ian.Rutherford@Smartbapps.co.uk wrote to me, it would be ‘one of the worst bets of all time’. No arguments there.

But the amazing thing is that with a planned strategy the probability that all players find their number, and therefore win the bet, can be increased to around 31%. The strategy which yields this probability goes like this…

Recall that the boxes themselves are numbered. Each player starts by opening the box corresponding to their own number. So, Player 1 opens box 1. If it contains their number they’ve won and they stop. Otherwise, whichever number they find in that box is chosen as the number of the box they will next look in. And they carry on this way, till either they find their number and stop; or they open 50 boxes without having fond their number. In the first case, that individual player has won, and it is the next player’s turn to enter the room (and play according to the same strategy); in the second case, they – and by the rules of the game the whole set of players – has lost.

So, Player 1 first opens box 1. If that box contains, say, the number 22, they next open box 22. If that contains the number 87, they next open box number 87. And so on, until they find their number or they reach the limit of 50 boxes. Similarly, Player 2 starts with box 2, which might contain the number 90; they next open box number 90; if that contains number 49 they then open box number 49, etc etc. Remarkably with this strategy the players will all find their own numbers – within the limit of 50 boxes each – with a probability of around 31%.

I find this amazing for 2 reasons:

  1. That fairly basic probability techniques can be used to show the strategy leads to calculate the win probability of around 31%;
  2. That the strategy results in such a massive increase in the win probability from virtually 0 to almost 1/3.

Unfortunately, though the calculations are all elementary, the argument is a touch too elaborate for me to reasonably include here. The Wikipedia entry for the puzzle – albeit with the cosmetic change of Smartodds employees being replaced by prisoners- does give the solution though, and it’s extremely elegant. If you feel like stretching your maths and probability skills just a little, it’s well worth a read.

In any case it’s instructive to look at a simpler case included in the Wikipedia article. Things are simplified there to just 8 employee/prisoners who have to find their number by opening at most 4 boxes (i.e. 50% of 8 boxes as opposed to 50% of 100 boxes in the original problem).

In this case suppose the numbers have been randomised into the boxes according to the following table…

Box number 1 2 3 4 5 6 7 8
Card number 7 4 6 8 1 3 5 2

Now suppose the players play according to the strategy described above except that they keep playing until they find their number, without stopping at the 4th box, even though they will have lost if they open more than 4 boxes. With the numbers randomised as above you can easily check that the sequence of boxes each player opens is as follows:

Player 1: (1, 7, 5)

player 2: (2, 4, 8)

Player 3: (3, 6)

Player 4: (4, 8, 2)

Player 5: (5, 1, 7)

Player 6: (6, 3)

Player 7: (7, 5, 1)

Player 8: (8, 2, 4)

With these numbers, since each player opens at most 3 boxes, everyone wins and the employers/prisoners get their increased bonus.

However, had the cards been randomised slightly differently among the boxes, as per the following table…

Box numbER 1 2 3 4 5 6 7 8
Card number 7 4 6 8 2 3 5 1

… then Player 1 (for example) would have followed the sequence (1, 7, 5, 2, 4, 8) and would therefore have lost, having opened more than 4 boxes.

Now observe:

  1. Several players follow the same sequence, albeit in a different order. For example, with the first randomisation, Players 1, 5 and 7 are each following the sequence (1, 7, 5) in some order;
  2. The complete set of different sequences – ignoring changes of order – are in the first case (1, 7, 5), (2, 4, 8) and (3, 6). They are bound not to overlap and are also bound to contain the complete set of numbers 1-8 between them.
  3. The fact that none of the sequences is longer than 4 with this randomisation means that the game has been won by all players. With the second randomisation, the fact that at least one of the sequences – (1, 7, 5, 2, 4, 8) – is longer than 4 means that the game has been lost.
  4. It follows that the probability the game is won is equal to the probability that the longest sequence has length at most 4.
  5. This argument applies more generally, so that with 100 players the game is won if the longest sequence of boxes opened has length at most 50.

Remarkably, it turns out to be not too difficult to calculate this probability that the longest sequence has length at most 50% of the number of players. And with 100 players it turns out be approximately 31%. And as if that’s not remarkable enough, the same proof shows that with an unlimited number of players the above strategy leads to a win probability of around 30.7%. In other words, in replacing 100 players with 1 million players, the win probability only drops from around 31% to 30.7%.

All quite incredible. But even without studying the detailed proof you can maybe get an idea from the 8-player example of why the strategy works. By playing this way, even though each individual player wins with probability 1/2, they no longer win or lose independently of one another. If Player 1 wins, every other player in their sequence of searches – Players 5 and 7 in the first example above – also wins. So, the suggested strategy induces dependence in the win/lose events of the individual players, and this leads to a change in win probability from something close to 0 to something close to 1/3.

Something similar actually came up earlier in the blog in the context of accumulator bets. I mentioned that betting on Mark Kermode’s Oscar predictions might be a good accumulator bet since the success of his predictions might not be independent events, and this had the potential to generate value against bookmakers who assume independence when setting prices for accumulators.

Finally, to answer the question: should the employees accept the challenge? If their original bonus is, say, £1000, then that becomes £10000 if they win, but £0 if they lose. So, with probability 31% they gain £9000, but with probability 69% they lose £1000. It follows that their expected gain if they play is

31\% \times \pounds 9000 - 69\% \times \pounds 1000 = \pounds 2100,

which is a comfortably positive expected profit for an outlay of £1000. So, they should definitely play, as long as they follow the strategy described above.


Two quick footnotes:

  1. It’s more difficult to prove, but it turns out that the strategy described above is optimal – there’s no other strategy that would lead to a bigger win probability than 31%;
  2. All of the above assumes that everyone follows the described strategy correctly. It would just take a couple of player to not follow the rules for all of the value of the bet to be lost. So, if the employees thought there might be a couple of, let’s say, ‘slow-learners’ in the company, it might be safer for them not to play and just take the £1000 and run.

Relatively speaking

Last week, when discussing Kipchoge’s recent sub 2-hour marathon run, I showed the following figure which compares histograms of marathon race times in a large database of male and female runners.

I mentioned then that I’d update the post to discuss the other unusual shape of the histograms. The point I intended to make concerns the irregularity of the graphs. In particular, there are many spikes, especially before the 3, 3.5 and 4 hour marks. Moreover, there is a very large drop in the histograms – most noticeably for men – after the 4 hour mark.

This type of behaviour is unusual in random processes:. frequency diagrams of this type, especially those  based on human characteristics, are generally much smoother. Naturally, with any sample data, some degree of irregularity in frequency data is inevitable, but:

  1. These graphs are based on a very large sample of more than 3 million runners, so random variations are likely to be very small;
  2. Though irregular in shape, the timings of the irregularities are themselves regular.

So, what’s going on?

The irregularities are actually a consequence of the psychology of marathon runners attempting to achieve personal targets. For example, many ‘average’ runners will set a race time target of 4 hours. Then, either through a programmed training regime or sheer force of will on the day of the race, will push themselves to achieve this race time. Most likely not by much, but enough to be on the left side of the 4-hour mark.

The net effect of many runners behaving similarly is to cause a surge of race times just before the 4-hour mark and a dip thereafter. There’s a similar effect at 3 and 3.5 hours – albeit of a slightly smaller magnitude – and smaller effects still at what seem to be around 10 minute intervals. So, the spikes in the histograms are due to runners consciously adapting their running pace to meet self-set objectives which are typically at regular times like 3, 3.5, 4 hours and so on.

Thanks to those of you that wrote to me to explain this effect.

Actually though, since writing the original post, something else occurred to me about this figure, which is why I decided to write this separate post instead of just updating the original one. Take a look at the right hand side of the plot – perhaps from a finish time of around 5 hours onwards. The values of the histograms are pretty much the same for men and women in this region. This contrasts sharply with the left side of the diagram where there are many more men than women finishing the race in, say, less than 3 hours. So, does this mean that although at faster race times there are many more men than women, at slow race times there are just as many women as men?

Well, yes and no. In absolute terms, yes: there are pretty much the same number of men as women completing the race with a time of around 6 hours. But… this ignores the fact that there are actually many more men than women overall – one of the other graphics on the page from which I copied the histograms states that the male:female split in the database is 61.8% to 31.2%. So, although the absolute numbers of men race times is similar to that of women, the proportion of runners that represents is considerably lower compared to women.

Arguably, comparing histograms gives a misleading representation of the data. It makes it look as though men and women are equally likely to have a race time of around 6 hours. Though true, this is only because many more men than women run the marathon.  The proportion of men completing the race with a time of around 6 hours is considerably smaller than that of women.

The same principle holds at all race times but is less of an issue when interpreting the graph. For example, the difference in proportions of men and women having a race time of around 4 hours is smaller than that of the actual frequencies in the histograms above, but it is still a big difference. It’s really where the absolute frequencies are similar that the picture above can be misleading.

In summary: there is a choice when drawing histograms of using absolute or relative frequencies. (Or counts and percentages). When looking at a single histogram it makes little difference – the shape of the histogram will be identical in both cases. When comparing two or more sets of results, histograms based on relative frequencies are generally easier to interpret. But in any case, when interpreting any statistical diagram, always look at the fine detail provided in the descriptions on the axes so as to be sure what you’re looking at.


Footnote:

Some general discussion and advice on drawing histograms can be found here.

It’s official: Brits get drunk more often than anywhere else in the WORLD

A while back the Global Drug Survey (GDS) produced its annual report. Here are some of the newspaper headlines following its publication:

It’s official: Brits get drunk more often than anywhere else in the WORLD. (The Mirror)

Britons get drunk more often than 35 other nations, survey finds. (The Guardian)

Brits are world’s biggest boozers and we get hammered once a week, study says. (The Sun)

And reading some of these articles in detail we find:

  • Of the 31 countries included in the study, Britons get drunk most regularly (51.1 times per year, on average).
  • Britain has the highest rate of cocaine usage (74% of participants in the survey say they have used it at some point).
  • 64% of English participants in the survey claim to have used cocaine in the last year.

Really? On average Brits are getting drunk once a week? And 64% of the population have used cocaine in the last year? 64%!

Prof Adam Winstock, founder of the survey, summarises things thus:

In the UK we don’t tend to do moderation, we end up getting drunk as the point of the evening.

At which point it’s important to take a step back and understand how the GDS works. If you want a snapshot of a population as a whole, you have to sample in such a way that every person in the population is equally likely to be sampled. Or at least ensure by some other mechanism that the sample is truly representative of the population. But the Global Drug Survey is different: it’s an online survey targeted at people whose demographics coincide with people who are more likely to be regular drinkers and/or drug users.

Consequently, it’s safe to conclude that the Brits who chose to take this survey are likely to get drunk more often than people from other countries who also completed the survey. And that 64% of British participants in the survey have used cocaine last year. But since this sample is neither random nor designed to be representative, it really tells us nothing about the population as a whole. And even comparisons of the respondents across countries should be treated cautiously: perhaps the differences are not due to variations in drink/drug usage but instead due to variations in the composition of the survey respondents across countries.

Here’s what the GDS say themselves about this…

Don’t look to GDS for national estimates. GDS is designed to answer comparison questions that are not dependent on probability samples. The GDS database is huge, but its non-probability sample means analyses are best suited to highlight differences among user populations. GDS recruits younger, more experienced drug using populations. We spot emerging drugs trends before they enter into the general population.

In other words, by design the survey samples people who are more likely to drink regularly or to have used drugs, and the GDS itself therefore warns against the headline use of the numbers. It’s not really that 64% of the UK population that’s used cocaine the last year; it’s 64% of a self-selected group who are in a demographic that are more likely to have used cocaine and who responded to an online survey.

To emphasise this point the GDS information page identifies the following summary characteristics of respondents to the survey:

  • a 2:1 ratio of male:female;
  • 60% of participants with at least a university degree;
  • an average age of 25 years;
  • more than 50% of participants reporting to have regular involvement in nightlife and clubbing.

Clearly these characteristics are quite different from those of the population as a whole and, as intended by the study, orientated towards people that are more likely to have a drinking or drug habit. At which point the newspaper headlines become much less surprising.

Now, there’s nothing wrong with carrying out surveys in this way. If you’re interested in attitudes and behaviours among drinkers and drug users, there’s not much point in wasting time on people who indulge in neither. But… what you get out of this is a snapshot of people whose characteristics match those of the survey respondents, not of the population as a whole. And sure, this is all spelt out very clearly in the GDS report itself, but that doesn’t stop the tabloids (and even the Guardian) from headlines that make it seem like Britain is drink/drug capital of the world.

In summary:

  • You can extrapolate the results of a sample to a wider population only if the sample is genuinely representative of the whole population;
  • The best way of ensuring this is to do random sampling where each member of the population could be included in the sample;
  • The media aren’t going to let niceties of this type get in the way of a good headline, so you need to be extremely wary when reading media reports based on statistical surveys.

What seems to be a more scientific approach to studies in the variation of alcohol consumption across countries is available here. On this basis, at least in 2014, average alcohol consumption in the UK was considerably lower than that in, say, France or Germany. That’s not to say Brits got drunk less: it might still be that a proportion of people drink excessively – to the point of getting drunk – but the overall average is still relative low.

However, if you look down the page there’s this graph…

…which can be interpreted as giving the proportion of each country’s population – admittedly in 2010 – who had at least one heavy night out in a period of 30 days. France and the UK are pretty much level on this basis, and not particularly extreme. Lithuania seems to be the most excessive European country in these terms, while king of the world is apparently Madagascar, where 64.8% of the population reported a heavy drinking session over the 30 day period. So…

It’s official: Madagascans get drunk more often than anywhere else in the WORLD