Stick a monkey on a typewriter, let him hit keys all day, and what will you get? Gibberish, probably. But what if you’re prepared to wait longer than a day? Much longer than a day. Infinitely long, say. In that case, the monkey will produce the complete works of Shakespeare. And indeed any and every other work of literature that’s ever been written.

This is from Wikipedia:

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, such as the complete works of William Shakespeare.

Infinity is a tricky but important concept in mathematics generally. We saw the appearance of infinity in a recent post, where we looked at the infinite sequence of numbers

1, 1/2, 1/4, 1/8,….

and asked what their sum would be. And it turned out to be 2. In practice, you can never really add infinitely many numbers, but you can add more and more terms in the sequence, and the more you add the closer you will get to 2. Moreover, you can get as close to 2 as you like by adding sufficiently many terms in the sequence. It’s in this sense that the sum of the infinite sequence is 2.

In Statistics the concept of infinity and infinite sums is equally important, as we’ll discuss in a future post. But meantime… the infinite monkey theorem. What this basically says is that if something can happen in an experiment, and you repeat that experiment often enough, then eventually it will happen.

Sort of. There’s still a possibility that it won’t – the monkey could, by chance, just keep hitting the letter ‘a’ totally at random forever, for example – but that possibility has zero probability. That’s the ‘almost surely’ bit in the Wikipedia definition. On the other hand, with probability 1 – which is to say complete certainty – the monkey will eventually produce the complete works of Shakespeare.

Let’s look at the calculations, which are very similar to those in another recent post.

There are roughly 50 keys on a keyboard, so assuming the monkey is just hitting keys at random, the probability that the first key stroke matches the first letter of Shakespeare’s works is 1/50. Similarly, the probability the second letter matches is also 1/50. So to get the first two matching it’s

$1/50 \times 1/50$

Our monkey keeps hitting keys and at each new key stroke, the probability that the match-up continues is multiplied by 1/50. This probability gets small very, very quickly. But it never gets to zero.

Now, if the monkey has to hit N keys to have produced a text as long as the works of Shakespeare, by this argument he’ll get a perfect match with probability

$p=(1/50)^N$

This will be a phenomenally small number. Virtually zero. But, crucially, not zero. Because if our tireless monkey repeats that exercise a large number of times, let’s say M times, then the probability he’ll produces Shakespeare’s works at least once is

$Q = 1-(1-p)^M$

And since p is bigger than zero – albeit only slightly bigger than zero –  then Q gets bigger with N. And just as the sum of the numbers 1, 1/2, 1/4, … gets closer and closer to 2 as the number of terms increases, so Q can be made as close to 1 as we like by choosing M large enough.

Loosely speaking, when M is infinity, the probability is 1. And even more loosely: given an infinite amount of time our monkey is bound to produce the complete works of Shakespeare.

Obviously, both the monkey and the works of Shakespeare are just metaphors, and the idea has been expressed in many different forms in popular culture.  Here’s Eminem’s take on it, for example:

# At The Intersection

You’ll remember Venn diagrams from school. They’re essentially a mathematical tool for laying out the information in partially overlapping sets. And in statistics they are often used in the same way for showing the possible outcomes in events which might overlap.

For example, here’s a Venn diagram showing the relationship between whales and fish:

Whales and fish have some properties that are unique, but they also have some features in common. These are all shown in the appropriate parts of the diagram, with the common elements falling in the part of the sets that overlap – the so-called intersection.

With this in mind, I recently came across the following Venn poem titled ‘At the Intersection’ written by Brian Bilston:

You can probably work it out. There are three poems in total:  separate ones for ‘him’ and ‘her’ and their intersection. Life seen from two different perspectives, the result of which is contained in the intersection.

Genius.

# One-in-a-million

Suppose you can play on either of 2 slot machines:

1. Slot machine A pays out with probability one in a million.
2. Slot machine B pays out with probability one in 10.

Are you more likely to get a payout with one million attempts with slot machine A or with 10 attempts on slot machine B?

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

So, there’s a bigger probability (0.65) that you’ll get a payout from 10 spins of slot machine B than from a million spins of slot machine A (probability 0.63).

Hopefully, the calculations above are self-explanatory. But just in case, here’s the detail. Suppose you have N attempts to win with a slot machine that pays out with probability 1/N.

1. First we’ll calculate the probability of zero payouts in the N spins.

2. This means we get a zero payout on every spin.

3. The probability of a zero payout on one spin is one minus the probability of a win: 1 – 1/N.

4. So the probability of no payout on all the spins is

$(1-1/N)^N$

5. And the probability of at least one payout is

$1- (1-1/N)^N$

As explained in the tweet, with N=10 this gives 0.65 and with N=1,000,000 it gives 0.63. The tweet’s author explains in a follow-up tweet that he was expecting the same answer both ways.

But as someone in the discussion pointed out, that logic can’t be right. Suppose you had one attempt with slot machine C which paid out with probability 1. In other words, N=1 in my example above. Then, of course, you’d be bound to get a payout, so the probability of at least one payout is 1. So, although it’s initially perhaps surprising that you’re more likely to get a payout with 10 shots at slot machine B than with a million shots at slot machine A, the dependence on N becomes obvious when you look at the extreme case of slot machine C.

Footnote: What does stay the same in each case however is the average number of times you will win. With N shots at a slot machine with win probability 1/N, you will win on average once for any choice of N. Sometimes you’ll win more often, and sometimes you may not win at all (except when N=1). But the average number of wins if you play many times will always be 1.

# Juvenile dinosaurs

This blog is mostly about Statistics as a science rather than statistics as numbers. But just occasionally the statistics themselves are so shocking, they’re worthy of a mention.

With this in mind I was struck by two statistics of a similar theme in the following tweet from Ben Goldacre (author of the Bad Science website and book):

Moreover, in the discussion following Ben’s tweet, someone linked to the following cartoon figure:

This shows that even if you change the way of measuring distance from time to either phylogenetic distance or physical similarity, the following holds: the distance between a sparrow and T-Rex is smaller than that between T-Rex and Stegosaurus.

Footnote 1: this is more than a joke. Recent research makes the case that there is a strong evolutionary link between birds and dinosaurs. As one of the authors writes:

We now understand the relationship between birds and dinosaurs that much better, and we can say that, when we look at birds, we are actually looking at juvenile dinosaurs.

Footnote 2. Continuing the series (also taken from the discussion of Ben’s tweet)… Cleopatra is closer in time to the construction of the space shuttle than the pyramids.

Footnote 3. Ben Goldacre’s book, Bad Science, is a great read. It includes many examples of the way science – and Statistics – can be misused.

# Problem solved

A while back I set a puzzle asking you to try to remove three coins from a red square region as shown in the following diagram.

The only rule of the game is that when a coin is removed it is replaced with 2 coins – 1 immediately to the right of and one immediately below the coin that is removed. If there is no space for adding these replacement coins, the coin cannot be removed.

The puzzle actually appeared in a recent edition of Alex Bellos’ Guardian mathematics puzzles, though it was created by the Argentinian mathematician Carlos Sarraute. This is his solution which is astonishing for its breathtaking ingenuity.

The solution starts by giving a value to every square in the grid as follows:

Remember, the grid goes on forever both to the right and downwards. The top left hand box has value 1. Going right from there, every subsequent square has value equal to 1/2 of the previous one. So: 1, 1/2, 1/4, 1/8 and so on. The first column is identical to the first row. To complete the second row, we start with the first value, 1/2, and again just keep multiplying by 1/2. The second column is the same as the second row. And we fill the entire grid this same way. Technically, every row and column is a series of geometric numbers: consecutive multiples of a common number, which in this case is 1/2.

Let’s define the value of a coin to be the value of the square its on. Then the total value of the coins at the start of the game is

$1 + \frac{1}{2} + \frac{1}{2}= 2$

Now…

• When we remove a coin we replace it with two coins, one immediately to the left and one immediately to the right. But if you look at the value any square on the grid, it is equal to the sum of the values of the squares immediately below and to the right. So when we remove a coin we replace it with two coins whose total value is the same. It follows that the total value of the coins stays unchanged however many moves we make. It will always be 2 however many moves we make.
• This is the only tricky mathematical part. Look at the first row of numbers. It consists of 1, 1/2, 1/4, 1/8… and goes on forever. But even though this is an infinite sequence it has a finite sum of 2. Obviously, we can never really add infinitely many numbers in practice, but by adding more and more terms in the series we will get closer and closer to the value of 2. Try it on a calculator. In summary:

$1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} +\ldots = 2.$

• Working down the rows, the second row is the same as the first with the first term removed. So its sum must be 1. The third is the same as the second with the first term of 1/2 removed, so its sum is 1/2. By the same reasoning, the sum of the fourth row will be 1/4, the fifth row 1/8 and so on.
• So, the row sums are respectively 2, 1, 1/2, 1/4, …. This is the same as the values of the first row with the additional first term of 2. It follows that the sum of the row sums, and therefore the sum of all numbers in the grid is 2+2=4. Again, we can’t add all the numbers in the practice, but we will get closer and closer to the value of 4 by adding more and more squares.
• The total value of the squares inside the red square is 1 + 1/2 + 1/2 + 1/4 = 9/4. The total value outside this region must therefore be 2-9/4= 7/4.
• Putting all this together, the initial value of the coins was 2. After any number of moves, the total value of all coins will always remain 2. But the total value of all squares outside the red square is only 7/4. It must therefore be impossible to remove the three coins from the red square because to do so would require the coins outside of this area to have a value of 2, which is greater than the total value available in the entire region.

I find this argument quite brilliant. My instincts were to try to solve the puzzle using arguments from geometry. I failed. It would never have occurred to me to try to develop a solution based on the properties of numbers.

As I wrote in the original post, this puzzle doesn’t really have any direct relevant to Statistics except in so much as it shows the power and beauty of mathematical proof, which is an essential part of statistical theory. Having said that, the idea of infinite limits is important in Statistics, and I’ll discuss this in a further post. Let me just mention though that summing infinite series as in the solution above is a delicate issue for at least two reasons:

1. Although the sum 1 + 1/2 + 1/4 + 1/8 + …. has a finite sum of 2, this series 1 + 1/2 + 1/3 + 1/4 + 1/5 + …. has no finite sum. The sum grows very slowly, but as I take more and more numbers in the series, the sum grows without any limit. That’s to say, if you give me any number – say 1 million – I can always find enough terms in the series for the sum to be greater than that number.
2. To get the total value of the grid, we first added the rows and then added these row sums across the columns. We could alternatively have first added the columns, and then added these columns sums across the rows and we’d have got the same answer. For this example both these alternatives are valid. But in general this interchange of row and column sums to get the total sum is not valid. Consider, for example, this infinite grid:

The first row sums to 2, after which all other rows sum to zero. So, the sum of the row sums is 2. But looking at the columns, even column sums to zero. So, if we sum the columns and then sum these sums we get 0. This couldn’t possibly happen with finite grids, but infinite grids require a lot more care.

In a follow-up post we’ll consider limits of sums in the context of Statistics.

Finally, I’m grateful to Fabian.Thut@smartodds.co.uk for some follow-up discussion on the original post. In particular, we ended up discussing the following variation on the original puzzles. The rules are exactly the same as before, but the starting configuration of the coins is now as per the following diagram:

In this case, can the puzzle be solved? Does the argument presented for the original problem help in any way?

If you have any thoughts about this, please do write to me. In any case, I’ll write another post with the solution to this version shortly.

# Coincidentally

Here we go again…

Happy birthday to me. And Harry.Hill@smartodds.co.uk  and Rickie.Reynolds@smartodds.co.uk. And willfletcher1111@gmail.com who also used to be in the quant team. What a remarkable coincidence that 3 of us currently in the quant team – together with another quant who has since left – each have our birthday on the 11th November. But as I discussed in a post around this time last year as well as at a previous offsite, there are so many possible combinations of three or four people in the company that could have a shared birthday, that it’s not that very surprising that one combination does. It just happened to be me, Harry, Rickie and Will on 11/11.

And on the subject of coincidences…

You may have heard of an app called what3words. This is a location app for ios or android which divides the entire globe into 3 x 3 metre squares and assigns 3 words to each square. For example, currently sitting at my desk in the Smartodds office, the 3 allocated words are “insert”, “falls”, “opens”. The idea is that in an emergency I can identify and communicate my unique position to the relevant emergency services by means of just these 3 words. Of course, I could do the same thing with my GPS coordinates, but the point is that standard words are easier to read and communicate in a hurry. And there are already a number of instances in which lives have potentially been saved through use of the app.

And the coincidence? Well, I opened the app in my house the other day with this result:

… and, here in a recent Halloween pic with my Grandson, is my Granddaughter…

… the charmingly awesome Poppy!

Footnote: writing this now, I’m reminded that some time ago, as a follow-up to a post  which also discussed coincidences, Richard.Greene@smartodds.co.uk mailed me about an experience he’d recently had. He described it as follows:

I was listening to the radio one morning, and the presenter mentioned “French windows”. I wasn’t sure at the time what they were, and remember amusing myself as to what made them French exactly – perhaps they come with a beret on top etc…anyway, an hour later, I was watching Frasier over my cornflakes and there was a joke/reference to French windows!

Like the shared birthdays, if you tried to calculate the chance of the mentioning of French windows on both the radio and an episode of Frasier within a short time of one another, the probability would be incredibly remote. But again, we experience so many opportunities for coincidences every day, that although the vast majority don’t happen, one or two inevitably do. And they’re the ones we remember and sometimes ascribe to ‘fate’, ‘destiny’, ‘karma’ etc etc. When in fact it’s just the laws of probability playing out in our daily lives.

Anyway, Richard suggested that an idea for a blog post would be collect and collate ‘coincidences’ of the kind I’ve described here – my experience with what3words; Richard’s with French windows. So, if you’ve recently had, or have in the near future, a coincidental experience of some sort, please send it to me and I’ll include it in a future post.

# Sleight of hand

A while ago I sent a post with a card trick which I explained had a statistical element to it, and asked you to try to work out how it was done. Thanks to those of you who wrote to me with variants on the correct answer.

The rules of the game were that my assistant, Matteo, chose a card at random hidden from me. It happened to be a 5 in the video. I then turned the cards over one at a time and Matteo had to play a counting game. Once he reached the 5th card, he noted its value, which was a 10. So he then counted another 10 cards in the sequence, noted the value of that card, and so on until we ran out of cards. Matteo had to remember the final card in his sequence before the cards ran out, which turned out to be the eight of diamonds. My task as the magician was to predict what Matteo’s final card was, which I did successfully.

Now, there are 2 reasons why this is a statistical card trick.

1. It doesn’t always work. It does so with a reasonably high probability, but depending on the  configuration of the cards once they are shuffled, won’t always. I’ll be honest: we had to remake the video several times, but that was always due to my incompetence in explaining the trick and not because it ever failed. Still, it won’t always work.
2. The second reason it’s a statistical trick is in its execution. The way it works is that I also play the same counting game as Matteo, but starting with the value of the first card I turn over, which happened to be a 10. So, we’re both playing the same counting game but from different starting points. Matteo’s starting point is 5, mine is 10. Although we start from different places, it turns out to be quite likely – though not certain – that the counting sequences we follow will overlap at some point. And once they do overlap, we are then following exactly the same sequence and so will arrive at the same final card.

Technically, the sequences of cards Matteo and I are both following are called Markov chains. These are sequences of random numbers such that in order to understand what the next card might be I only need to know the value of the current card, without knowing the past sequence that took me to the current state. In other words, when Matteo has to start counting 10 cards, it doesn’t matter how he got to that position, just that that’s where he currently is. And I also generate my own Markov chain. With an unlimited number of cards in a pack, the mathematical properties of Markov chains would guarantee that our sequences  meet at some point, after which we would be following exactly the same sequence, leading me to have the same final card as Matteo. With just 52 cards in a pack, there’s no guarantee, which is why the trick won’t always work.

The fact that the trick might not work is a little undesirable, but you can increase the chances by counting picture cards as 1 rather than 10. This forces the sequences to change card more often, which increases the chances of our two sequences overlapping.

Markov chains are actually really important building blocks for modelling in many areas of Statistics, which is one reason why I used posted the card trick to the blog. I’ll use a future post to explain this though.

# The wonderful and frightening world of Probability

A while back I posted a puzzle based on a suggestion by Fabian.Thut@smartodds.co.uk in which Smartodds employees are – hypothetically – given the opportunity to increase their bonus by a factor of 10. See the original post for the rules of the game.

As I wrote at the time, the solution is not at all obvious – I don’t think I could have found it myself – but it includes some important ideas from Statistics. It goes as follows…

Each individual employee has a probability equal to 1/2 of finding their number. This is because they can open 50 of the 100 boxes, leaving another 50 unopened. It’s then obvious by symmetry that they must have a 50-50 chance of finding their number, since all numbers are randomly distributed among the boxes.

But recall, the players’ bonus is multiplied by 10 only if all players find their own number.

To begin with, let’s assume that the employees play the game without any strategy at all. In that case they are playing the game independently, and the standard rules of probability mean that we must multiply the individual probabilities to get the overall win probability. So, the probability that the first 2 players both win is 1/2 * 1/2. The probability that the first 3 players all win is 1/2 * 1/2 * 1/2. And the probability that all 100 players win is 1/2 multiplied by itself 100 times, which is roughly

0.000000000000000000000000000000789.

In other words, practically zero. So, the chances of the bonuses being multiplied by 10 is vanishingly small; it’s therefore almost certain that everyone will lose their bonus if they choose to play the game. As Ian.Rutherford@Smartbapps.co.uk wrote to me, it would be ‘one of the worst bets of all time’. No arguments there.

But the amazing thing is that with a planned strategy the probability that all players find their number, and therefore win the bet, can be increased to around 31%. The strategy which yields this probability goes like this…

Recall that the boxes themselves are numbered. Each player starts by opening the box corresponding to their own number. So, Player 1 opens box 1. If it contains their number they’ve won and they stop. Otherwise, whichever number they find in that box is chosen as the number of the box they will next look in. And they carry on this way, till either they find their number and stop; or they open 50 boxes without having fond their number. In the first case, that individual player has won, and it is the next player’s turn to enter the room (and play according to the same strategy); in the second case, they – and by the rules of the game the whole set of players – has lost.

So, Player 1 first opens box 1. If that box contains, say, the number 22, they next open box 22. If that contains the number 87, they next open box number 87. And so on, until they find their number or they reach the limit of 50 boxes. Similarly, Player 2 starts with box 2, which might contain the number 90; they next open box number 90; if that contains number 49 they then open box number 49, etc etc. Remarkably with this strategy the players will all find their own numbers – within the limit of 50 boxes each – with a probability of around 31%.

I find this amazing for 2 reasons:

1. That fairly basic probability techniques can be used to show the strategy leads to calculate the win probability of around 31%;
2. That the strategy results in such a massive increase in the win probability from virtually 0 to almost 1/3.

Unfortunately, though the calculations are all elementary, the argument is a touch too elaborate for me to reasonably include here. The Wikipedia entry for the puzzle – albeit with the cosmetic change of Smartodds employees being replaced by prisoners- does give the solution though, and it’s extremely elegant. If you feel like stretching your maths and probability skills just a little, it’s well worth a read.

In any case it’s instructive to look at a simpler case included in the Wikipedia article. Things are simplified there to just 8 employee/prisoners who have to find their number by opening at most 4 boxes (i.e. 50% of 8 boxes as opposed to 50% of 100 boxes in the original problem).

In this case suppose the numbers have been randomised into the boxes according to the following table…

Box number 1 2 3 4 5 6 7 8
Card number 7 4 6 8 1 3 5 2

Now suppose the players play according to the strategy described above except that they keep playing until they find their number, without stopping at the 4th box, even though they will have lost if they open more than 4 boxes. With the numbers randomised as above you can easily check that the sequence of boxes each player opens is as follows:

Player 1: (1, 7, 5)

player 2: (2, 4, 8)

Player 3: (3, 6)

Player 4: (4, 8, 2)

Player 5: (5, 1, 7)

Player 6: (6, 3)

Player 7: (7, 5, 1)

Player 8: (8, 2, 4)

With these numbers, since each player opens at most 3 boxes, everyone wins and the employers/prisoners get their increased bonus.

However, had the cards been randomised slightly differently among the boxes, as per the following table…

Box numbER 1 2 3 4 5 6 7 8
Card number 7 4 6 8 2 3 5 1

… then Player 1 (for example) would have followed the sequence (1, 7, 5, 2, 4, 8) and would therefore have lost, having opened more than 4 boxes.

Now observe:

1. Several players follow the same sequence, albeit in a different order. For example, with the first randomisation, Players 1, 5 and 7 are each following the sequence (1, 7, 5) in some order;
2. The complete set of different sequences – ignoring changes of order – are in the first case (1, 7, 5), (2, 4, 8) and (3, 6). They are bound not to overlap and are also bound to contain the complete set of numbers 1-8 between them.
3. The fact that none of the sequences is longer than 4 with this randomisation means that the game has been won by all players. With the second randomisation, the fact that at least one of the sequences – (1, 7, 5, 2, 4, 8) – is longer than 4 means that the game has been lost.
4. It follows that the probability the game is won is equal to the probability that the longest sequence has length at most 4.
5. This argument applies more generally, so that with 100 players the game is won if the longest sequence of boxes opened has length at most 50.

Remarkably, it turns out to be not too difficult to calculate this probability that the longest sequence has length at most 50% of the number of players. And with 100 players it turns out be approximately 31%. And as if that’s not remarkable enough, the same proof shows that with an unlimited number of players the above strategy leads to a win probability of around 30.7%. In other words, in replacing 100 players with 1 million players, the win probability only drops from around 31% to 30.7%.

All quite incredible. But even without studying the detailed proof you can maybe get an idea from the 8-player example of why the strategy works. By playing this way, even though each individual player wins with probability 1/2, they no longer win or lose independently of one another. If Player 1 wins, every other player in their sequence of searches – Players 5 and 7 in the first example above – also wins. So, the suggested strategy induces dependence in the win/lose events of the individual players, and this leads to a change in win probability from something close to 0 to something close to 1/3.

Something similar actually came up earlier in the blog in the context of accumulator bets. I mentioned that betting on Mark Kermode’s Oscar predictions might be a good accumulator bet since the success of his predictions might not be independent events, and this had the potential to generate value against bookmakers who assume independence when setting prices for accumulators.

Finally, to answer the question: should the employees accept the challenge? If their original bonus is, say, ￡1000, then that becomes ￡10000 if they win, but ￡0 if they lose. So, with probability 31% they gain ￡9000, but with probability 69% they lose ￡1000. It follows that their expected gain if they play is

$31\% \times \pounds 9000 - 69\% \times \pounds 1000 = \pounds 2100$,

which is a comfortably positive expected profit for an outlay of ￡1000. So, they should definitely play, as long as they follow the strategy described above.

Two quick footnotes:

1. It’s more difficult to prove, but it turns out that the strategy described above is optimal – there’s no other strategy that would lead to a bigger win probability than 31%;
2. All of the above assumes that everyone follows the described strategy correctly. It would just take a couple of player to not follow the rules for all of the value of the bet to be lost. So, if the employees thought there might be a couple of, let’s say, ‘slow-learners’ in the company, it might be safer for them not to play and just take the ￡1000 and run.

# Magic

Here’s a statistical card trick. As I try to explain in the video, admittedly not very clearly, the rules of the trick are as follows:

1. Matteo picks a card at random from the pack. This card is unknown to me.
2. I shuffle the cards and turn them over one at a time.
3. As I turn the cards over, Matteo counts them in his head until he reaches that number in the sequence. As you’ll see, his card was a 5, so he counts the cards until he reaches the 5th one.
4. He then repeats that process, starting with the value of the 5th card, which happened to be a 10. So, he counts – again silently – a further 10 cards. He remembers the value of that card, and counts again that many cards.
5. And so on until we run out of cards.
6. (Picture cards count as 10.)
7. Matteo has to remember the last card in his sequence before all of the cards ran out.
8. And I – the magician – have to predict what that card was.

Now take a look at the video….

How did I do it? And what’s it got to do with Statistics? I’ll explain in a future post, but as usual if you’d like to write to me with your ideas I’ll be very happy to hear from you.

In one of the earliest posts to the blog last year I set a puzzle where I suggested Smartodds were offering employees the chance of increasing their bonus, and you had to decide whether it was in their interests to accept the offer or not.

<They weren’t, and they still aren’t, but let’s play along>.

Same thing this year, but the rules are different. Eligible employees are invited to gamble their bonus at odds of 10-1 based on the outcome of a game. It works like this…

For argument’s sake, let’s suppose there are 100 employees that are entitled to a bonus. They are told they each have the opportunity to increase their bonus by a factor of 10 by playing the following game:

• Each of the employees is randomly assigned a number between 1 and 100.
• Inside a room there are 100 boxes, also labelled 1 to 100.
• 100 cards, numbered individually from 1 to 100, have been randomly placed inside the boxes, so each numbered box contains a card with a unique random number from 1 to 100. For example, box number 1 might contain the card with number 62; box number 2 might contain the card with number 25; and so on.
• Each employee must enter the room, one a a time, and can choose any 50 of the boxes to open. If they find the card with their own number in one of those boxes, they win. Otherwise they lose.
• Though the employees may discuss the game and decide how they will play before they enter the room, they must not convey any information to the other employees after taking their turn.
• The employees cannot rearrange any of the boxes or the cards – so everyone finds the room in the same state when they enter.
• The employees will have their bonus multiplied by 10 if all 100 of them are winners. If there is a single loser, they all end up with zero bonus.

Should the employees accept this game, or should they refuse it and keep their original bonuses? And if they accept to play, should they adopt any particular strategy for playing the game?

Give it some thought and then scroll down for some discussion.

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

A good place to start is to calculate the probability that any one employee is a winner. This happens if one of the 50 boxes they open, out of the 100 available, contains the card with their number. Each box is equally likely to contain their number, so you can easily write down the probability that they win. Scroll down again for the answer to this part:

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

There are 100 boxes, and the employee selects 50 of them. Each box is equally likely to contain their number, so the probability they find their number in one of the boxes is 50/100 or 1/2.

So that’s the probability that any one employee wins. We now need to calculate the probability that they all win – bearing in mind the rules of the game – and then decide whether the bet is worth taking.

In summary:

• There are 100 employees;
• The probability that any one employee wins their game is 1/2;
• If they all win, their bonuses will all be multiplied by 10;
• If any one of them loses, they all get zero bonus.

Should the employees choose to play or to keep their original bonus? And if they play, is there any particular strategy they should adopt?

If you’d like to send me your answers I’d be really happy to hear from you. If you prefer just to send me a yes/no answer, perhaps just based on your own intuition, I’d be equally happy to get your response, and you can use this form to send the answer in that case.

This is a variant on a puzzle pointed out to me by Fabian.Thut@smartodds.co.uk. I think it’s a little more tricky than previous puzzles I’ve posted, but it illustrates a specific important statistical issue that I’ll discuss when giving the solution.