In a previous post I set a variation of the classic birthday problem:
What’s the least number of people you need in a room for there to be a 50% chance or more that everyone in the room has the same birthday as someone else in the room?
I mentioned that the problem is difficult to solve, but thought it might be interesting to see how good we are collectively at guessing the answer.
The actual value turns out to be 3064.
It’s not for the faint-hearted, but there’s an academic paper which contains a formula to calculate this result, although the formula as written seems to contain a misprint. Moreover, trying to implement the formula in a simplistic way leads to numerical instabilities resulting in both negative probabilities and probabilities greater than several million (!) for some choices of the number of people in the room. However, the corrected version of the formula seems to work over a reasonable range of numbers. (I checked with a simple simulation routine.)
Anyway, using the corrected formula results in the above graph, which shows the probability that everyone shares a birthday for numbers of people between 2000 and 5000. Below 2000 the probability is essentially zero; above 5000 and it’s essentially one. But between 2000 and 5000 the probability behaves as shown in the graph. You can see that to get a probability of at least 0.5 you need just over 3000 people, and actually the smallest number which takes the probability over the 0.5 threshold is 3064. If you guessed anywhere near that value, or indeed anywhere between 2000 and 5000, you did amazingly well.
One interesting thing about this problem is that the graph suggests that the fewer the people there are in the room, the smaller the probability that they all share the same birthday. Certainly for numbers within the range 2000 – 5000 we can see from the graph that’s true. It’s also true well outside of the range 2000 – 5000.
However, there’s one simple case where the probability is easy to calculate. Suppose there are just two people in the room. In this case the probability that everyone in the room shares a birthday is 1/365. To see this, suppose the first person’s birthday is D. Then everyone – i.e. both people – in the room will share a birthday if the second person’s birthday is also D. Under usual assumptions this is simply 1/365. So, although the graph above decreases as the number of people decreases (i.e. moving along the graph from right to left), there must be a point at which it starts to increase again, in order that when there are 2 people the probability goes up as far as 1/365.
As I wrote in the original post, one reason for setting this problem is to see how well we are able collectively to make a judgement on a problem like this, for which the true answer is very difficult to obtain. Your collective results are summarised in the following figure, with guesses shown as dots and the true answer shown as a red line:
The guesses varied from 184 to 50,000, with most of the guesses towards the lower end of that range. So, to show the values in a reasonable way, I’ve had to use a logarithmic scale for the graph. Each dot on the graph represents somebody’s guess, and I’ve had to jiggle the points a bit where there were two identical or near-identical guesses.
I’d summarise things as follows:
- If you count the dots you’ll reach a total of 12. So thanks to all 12 of you who replied, and I’m happy to buy each of you a drink at the Christmas dinner.
- Before you get too impressed by the fact that two people seem to have guessed the right answer, neither of these ‘guesses’ was perfect. One was 3061, the other was 3065. The fact that they are wrong implies that these respondents didn’t develop, or even google, the exact formula. And don’t be too impressed that they were so close to the true answer either: the guesses are so good that they are almost certainly not just guesses. Chances are that both these attempts derive from a simple simulation of the exercise, similar to the one I used myself to check the formula. It’s easy to get very close to the answer this way, but the inherent randomness of simulations means you need a very large number of simulations to get an accurate estimate of the probability. And deciding, for example, whether 3064 people leads to a probability slightly below or slightly above 0.5 is likely to be very time-consuming. (Time consuming for the computer, that is. I’m not very good at programming, but my version took about 5 minutes to code.)
- Excluding the
two cheatstwo clever people who almost certainly used simulation to solve the problem, most respondents underestimated the number of people needed. Remember, that until you get to around 2000 people, the probability is essentially zero. Only two of the remaining respondents overestimated the number. And the respondent who guessed 5000 was the person with a genuine guess who came closest to the true answer. Indeed, their guess of 5000 just about made it onto the previous graph showing all the probabilities that were greater than 0, but smaller than 1.
Conclusion: this was a very difficult problem to have much intuition about, even though the specification of the problem is very simple. Collectively we tended to underestimate the number of people needed, perhaps having been influenced by the fact that the number of people required to solve the classical birthday problem, 23, is surprisingly low. I actually think the distribution of values around the true number – albeit on a logarithmic scale – shows a reasonably good collective attempt at guessing the true answer. One way of seeing that is to use standard statistical techniques to create a probability distribution based on your guesses. This is shown in the following figure (again on a logarithmic scale):
As you can see, the true value sits reasonably well in the heart of the estimated distribution, albeit towards the upper end. Again this confirms that the collective answers were pretty good, showing the value of teamwork over individuality, even when it comes to guesswork. Remember to collect your free Christmas drink from me as a reward. (APPLIES TO RESPONDENTS ONLY)