Family problems

In an earlier post, I set the following problem:

You’re at a party and meet someone. After chatting for a bit, you work out that the girl you’re talking to has 2 kids and one of them is a boy. What are the chances she’s got 2 boys rather than a boy and a girl?

Following some feedback, I later updated that post to clarify that the question isn’t intended to be about possible slight differences in the birth rates of boys and girls. That’s an interesting biological and demographic issue, but wasn’t intended as the point of the question. For the purposes of the question I simply meant to assume that all children, regardless of a mother’s previous history of births, is equally likely to be a boy or a girl.

In that case, it’s very tempting to answer the question above as 1/2. Indeed, this was the point of the post. One child’s a boy. The other is just as likely to be a boy or a girl, so the answer must be 1/2.

Except it’s not.

The answer is 1/3, and here’s the reasoning…

Without any additional information, if a woman has a 2-child family the possibilities (with the oldest child listed first) are:

Boy- Boy, Boy-Girl, Girl-Boy, Girl-Girl

and because all children are equally likely to be male or female, these combinations are all equally likely. But we can rule out the Girl-Girl combination from the information in the problem, so the remaining possibilities are

Boy-Boy, Boy-Girl, Girl-Boy

with each being equally likely. But if you consider a boy in one of these pairs, it’s only in the first case that the other child is a boy. So, in just 1 of the 3 equally likely outcomes is the other child also a boy, so the probability is 1/3.

This simple problem illustrates the difficulty in calculating what is called a conditional probability – the probability of something conditional on some given information, in this case that one of the children is a boy. Whereas in general the chance of a child being a boy is 1/2, once you include the extra information that we’re looking at a 2-child family in which at least one of the children is a boy, the probability changes. At first it seems counter-intuitive, but once you’ve seen a few problems of this type, your intuition becomes sharper.

With that in mind, let me pose the same problem as above, but suppose you find out that one of the woman’s kids is a boy that was born on a Tuesday. Now what’s the probability that the other child is also a boy?

As usual, I’ll write a future post with the solution and discussion. But if you’d like to send me your own ideas, either by mail or via this survey form, I’d be really happy to hear from you.


Thanks to those of you who replied to the original question. Apart from some initial confusion about whether I was suggesting boys might be more or less common than girls in general, there was roughly a 50-50 split in answers between 1/2 and 1/3. As explained above, it’s easy to be misled into thinking the answer is 1/2, so there is no embarrassment in arriving at that answer. And like I say, that was the whole point of the post. Nonetheless, well done to those of you who got the correct answer of 1/3.

It’ll be interesting to see what replies I get to the revised problem above, so please do send me your answer or thoughts on the problem if you have time.

By coincidence

dna2

In an earlier post I suggested we play a game. You’d pick a sequence of three outcomes of a coin toss, like THT. Then I’d pick a different triplet, say TTT. I’d then toss a coin repeatedly and whoever’s chosen triplet showed up first in the sequence would be the winner.

In the post I gave the following example…

H T H H H T T H T H …..

… and with that outcome and the choices above you’d have won since your selection of THT shows up starting on the 7th coin toss, without my selection of TTT showing up before.

The question I asked was who this game favoured. Assuming we both play as well as possible, does it favour

  1. neither of us, because we both get to choose and the game is symmetric? Or;
  2. you, because you get to choose first and have the opportunity to do so optimally? Or;
  3. me, because I get to see your choice before I have to make mine?

The answer turns out to be 3. I have a big advantage over you, if I play smart. We’ll discuss what that means in a moment.

But in terms of these possible answers, it couldn’t have been 2. Whatever you choose I could have chosen the exact opposite and by symmetry, since H and T are equally likely, our two triplets would have been equally likely to occur first in the sequence. So, if you choose TTT, I choose HHH. If you choose HHT, I choose TTH and so on. In this way I don’t have an advantage over you, but neither do you over me. So we can rule out 2 as the possible answer.

But I can actually play better than that and have an advantage over you, whatever choice you make. I play as follows:

  1. My first choice in the sequence is the opposite of your second choice.
  2. My second and third choices are equal to your first and second choices.

So, if you chose TTT, I would choose HTT. If you chose THT, I would choose TTH. And so on. It’s not immediately obvious why this should give me an advantage, but it does. And it does so for every choice you can make.

The complete set of selections you can make, the selections I will make in response, and the corresponding odds in my favour are given in the following table.

Your Choice My Choice My win odds
HHH THH 7:1
HHT THH 3:1
HTH HHT 2:1
HTT HHT 2:1
THH TTH 2:1
THT TTH 2:1
TTH HTT 3:1
TTT HTT 7:1

As you can see, your best choice is to go for any of HTH, HTT, THH, THT, but even then the odds are 2:1 in my favour. That’s to say, I’m twice as likely to win as you in those circumstances. My odds increase to 3:1 – I’ll win three times as often as you – if you choose HHT or TTH; and my odds are a massive 7:1 – I’ll win seven times as often as you – if you choose HHH or TTT.

So, even if you play optimally, I’ll win twice as often as you. But why should this be so? The probabilities aren’t difficult to calculate, but most are a little more complicated than I can reasonably include here. Let’s take the easiest example though. Suppose you choose HHH, in which I case I choose THH. I then start tossing the coins. It’s possible that HHH will be the first 3 coins in the sequence. That will happen with probability 1/2 x 1/2 x 1/2 =1/8. But if that doesn’t happen, then there’s no way you can win. Because the first time HHH appears in the sequence it will have had to have been preceded by a T (otherwise HHH has occurred earlier). In which case my THH occurs before your HHH. So, you would have won with probability 1/8, and therefore I win with probability 7/8, and my odds of winning are 7:1.

Like I say, the other cases – except when you choose TTT, which is identical to HHH, modulo switching H’s and T’s – are a little more complicated, but the principle is essentially the same every time.

By coincidence, this game was invented by Walter Penney (Penney -> penny, geddit?), who published it in the Journal of Recreational Mathematics in 1969. It’s interesting from a mathematical/statistical/game-theory point of view because it’s an example of a non-transitive game. For example, looking at the table above, HHT is inferior to THH; THH is inferior to TTH; TTH is inferior to HTT; and HTT is inferior to HHT. Which brings us full circle. So, there’s no overall optimal selection. Each can be beaten by another, which in turn can be beaten by a different choice again. This is why the second player has an advantage: they can always find a selection that will beat the first player’s. It doesn’t matter that their choice can also be beaten, because the first player has already made their selection and it wasn’t that one.

The best known example of a non-transitive game is Rock-Paper-Scissors. Rock beats scissors; scissors beats paper; paper beats rock. But in that case it’s completely deterministic – rock will always beat paper, for example. In the Penney coin tossing game, HTT will usually beat TTT, but occasionally it won’t. So, it’s perhaps better defined as a random non-transitive game.

The game also has connections with genetics. Strands of DNA are long chains composed of sequences of individual molecules known as nucleotides. Only four different types of nucleotide are found in DNA strands, and these are usually labelled A, T, G and C. The precise ordering of these nucleotides in the DNA strand effectively define a code that will determine the characteristics of the individual having that DNA.

It’s not too much of a stretch of the imagination to see that a long sequence of nucleotides A, T, G and C is not so very different – albeit with 4 variants, rather than 2 – from a sequence of outcomes just like those from the tosses of a coin. Knowing which combinations of the nucleotides are more likely to occur than others, and other combinatorial aspects of their arrangements, proves to be an important contribution of Statistics to the understanding of genetics and the development of genetic intervention therapies. Admittedly, it’s not a direct application of Penney’s game, but the statistical objectives and techniques required are not so very different.


Thanks to those of you who wrote to me with solutions to this problem, all of which were at least partially correct.

Tennis puzzles

They’re not especially statistical, and not especially difficult, but I thought you might like to try some tennis-based puzzle questions. I’ve mentioned before that Alex Bellos has a fortnightly column in the Guardian where he presents mathematical puzzles of one sort or another. Well, to coincide with the opening of Wimbledon, today’s puzzles have a tennis-based theme. You can find them here.

I think they’re fairly straightforward, but in any case, Alex will be posting the solutions later today if you want to check your own answers.

I say they’re not especially statistical, but there is quite a lot of slightly intricate probability associated with tennis, since live tennis betting is a lucrative market these days. Deciding whether a bet is good value or not means taking the current score and an estimate of the players’ relative abilities, and converting that into a match win probability for either player, which can then be compared against the bookmakers’ odds. But how is that done? The calculations are reasonably elementary, but complicated by both the scoring system and the fact that players tend to be more likely to win a point on serve than return.

If you’re interested, the relevant calculations for all score situations are available in this academic paper, though this assumes players are equally strong on serve and return. It also assumes the outcome of each point is statistically independent from all other points – that’s to say, knowing the outcome of one point doesn’t affect the probability of who wins another point. So, to add to Alex’s 3 questions above, I might add:

Why might tennis points not be statistically independent in practice, and what is the likely effect on match probability calculations of assuming they are when they’re not?