By coincidence

dna2

In an earlier post I suggested we play a game. You’d pick a sequence of three outcomes of a coin toss, like THT. Then I’d pick a different triplet, say TTT. I’d then toss a coin repeatedly and whoever’s chosen triplet showed up first in the sequence would be the winner.

In the post I gave the following example…

H T H H H T T H T H …..

… and with that outcome and the choices above you’d have won since your selection of THT shows up starting on the 7th coin toss, without my selection of TTT showing up before.

The question I asked was who this game favoured. Assuming we both play as well as possible, does it favour

  1. neither of us, because we both get to choose and the game is symmetric? Or;
  2. you, because you get to choose first and have the opportunity to do so optimally? Or;
  3. me, because I get to see your choice before I have to make mine?

The answer turns out to be 3. I have a big advantage over you, if I play smart. We’ll discuss what that means in a moment.

But in terms of these possible answers, it couldn’t have been 2. Whatever you choose I could have chosen the exact opposite and by symmetry, since H and T are equally likely, our two triplets would have been equally likely to occur first in the sequence. So, if you choose TTT, I choose HHH. If you choose HHT, I choose TTH and so on. In this way I don’t have an advantage over you, but neither do you over me. So we can rule out 2 as the possible answer.

But I can actually play better than that and have an advantage over you, whatever choice you make. I play as follows:

  1. My first choice in the sequence is the opposite of your second choice.
  2. My second and third choices are equal to your first and second choices.

So, if you chose TTT, I would choose HTT. If you chose THT, I would choose TTH. And so on. It’s not immediately obvious why this should give me an advantage, but it does. And it does so for every choice you can make.

The complete set of selections you can make, the selections I will make in response, and the corresponding odds in my favour are given in the following table.

Your Choice My Choice My win odds
HHH THH 7:1
HHT THH 3:1
HTH HHT 2:1
HTT HHT 2:1
THH TTH 2:1
THT TTH 2:1
TTH HTT 3:1
TTT HTT 7:1

As you can see, your best choice is to go for any of HTH, HTT, THH, THT, but even then the odds are 2:1 in my favour. That’s to say, I’m twice as likely to win as you in those circumstances. My odds increase to 3:1 – I’ll win three times as often as you – if you choose HHT or TTH; and my odds are a massive 7:1 – I’ll win seven times as often as you – if you choose HHH or TTT.

So, even if you play optimally, I’ll win twice as often as you. But why should this be so? The probabilities aren’t difficult to calculate, but most are a little more complicated than I can reasonably include here. Let’s take the easiest example though. Suppose you choose HHH, in which I case I choose THH. I then start tossing the coins. It’s possible that HHH will be the first 3 coins in the sequence. That will happen with probability 1/2 x 1/2 x 1/2 =1/8. But if that doesn’t happen, then there’s no way you can win. Because the first time HHH appears in the sequence it will have had to have been preceded by a T (otherwise HHH has occurred earlier). In which case my THH occurs before your HHH. So, you would have won with probability 1/8, and therefore I win with probability 7/8, and my odds of winning are 7:1.

Like I say, the other cases – except when you choose TTT, which is identical to HHH, modulo switching H’s and T’s – are a little more complicated, but the principle is essentially the same every time.

By coincidence, this game was invented by Walter Penney (Penney -> penny, geddit?), who published it in the Journal of Recreational Mathematics in 1969. It’s interesting from a mathematical/statistical/game-theory point of view because it’s an example of a non-transitive game. For example, looking at the table above, HHT is inferior to THH; THH is inferior to TTH; TTH is inferior to HTT; and HTT is inferior to HHT. Which brings us full circle. So, there’s no overall optimal selection. Each can be beaten by another, which in turn can be beaten by a different choice again. This is why the second player has an advantage: they can always find a selection that will beat the first player’s. It doesn’t matter that their choice can also be beaten, because the first player has already made their selection and it wasn’t that one.

The best known example of a non-transitive game is Rock-Paper-Scissors. Rock beats scissors; scissors beats paper; paper beats rock. But in that case it’s completely deterministic – rock will always beat paper, for example. In the Penney coin tossing game, HTT will usually beat TTT, but occasionally it won’t. So, it’s perhaps better defined as a random non-transitive game.

The game also has connections with genetics. Strands of DNA are long chains composed of sequences of individual molecules known as nucleotides. Only four different types of nucleotide are found in DNA strands, and these are usually labelled A, T, G and C. The precise ordering of these nucleotides in the DNA strand effectively define a code that will determine the characteristics of the individual having that DNA.

It’s not too much of a stretch of the imagination to see that a long sequence of nucleotides A, T, G and C is not so very different – albeit with 4 variants, rather than 2 – from a sequence of outcomes just like those from the tosses of a coin. Knowing which combinations of the nucleotides are more likely to occur than others, and other combinatorial aspects of their arrangements, proves to be an important contribution of Statistics to the understanding of genetics and the development of genetic intervention therapies. Admittedly, it’s not a direct application of Penney’s game, but the statistical objectives and techniques required are not so very different.


Thanks to those of you who wrote to me with solutions to this problem, all of which were at least partially correct.

Leave a Reply