Cube-shaped poo

Do you like pizza? If so, I’ve got good and bad news for you.

The good news is that the 2019 Ig Noble prize winner in the category of medicine is Silvano Gallus, who received the award for…

… collecting evidence that pizza might protect against illness and death…

The bad news, for most of you, is that this applies…

…if the pizza is made and eaten in Italy.

Obviously, it’s a bit surprising that pizza can be considered a health food. But if you accept that, it’s also a bit surprising that it has to be Italian pizza. So, what’s going on?

The Ig Nobel prizes are a satirical version of the Nobel prizes. Here’s the Wikipedia description:

The Ig Nobel Prize (/ˌɪɡnˈbɛl/ IG-noh-BEL) is a satiric prize awarded annually since 1991 to celebrate ten unusual or trivial achievements in scientific research, its stated aim being to “honor achievements that first make people laugh, and then make them think.” The name of the award is a pun on the Nobel Prize, which it parodies, and the word ignoble.

As such, the prize is awarded for genuine scientific research, but for areas of research that are largely incidental to human progress and understanding of the universe. For example, this year’s prize in the field of physics went to a group of scientists for…

…studying how, and why, wombats make cube-shaped poo.

It’s in this context that Silvano Gallus won his award. But although the Ig Noble award says something about the irrelevance of the subject matter, it’s not intended as a criticism of the quality of the underlying research. Gallus’s work with various co-authors (all Italian) was published as an academic paper ‘Does Pizza Protect Against Cancer‘ in the International Journal of Cancer. This wouldn’t happen if the work didn’t have scientific merit.

Despite this, there are reasons to be cautious about the conclusions of the study. The research is based on a type of statistical experimental design known as a case-control study. This works as follows. Suppose, for argument’s sake, you’re interested in testing the effect of pizzas on the prevention of certain types of disease. You first identify a group of patients having the disease and ask them about their pizza-eating habits. You then also find a group of people who don’t have the disease and ask them about their pizza-eating habits. You then check whether the pizza habits are different in the two groups.

Actually, it’s a little more complicated than that. It might be that age or gender or something else is also different in the two groups, so you also need to correct for these effects as well. But the principle is essentially just to see whether the tendency to eat pizza is greater in the control group – if so, you conclude that pizza is beneficial for the prevention of the specified disease. And on this basis, for a number of different cancer-types, Silvano Gallus and his co-authors found the proportion of people eating pizzas occasionally or regularly to be higher in the control group than in the case group.

Case-control studies are widely used in medical and epidemiological studies because they are quick and easy to implement. The more rigorous ‘randomised control study’ would work as follows:

  1. You recruit a number of people for the study, none of whom have the disease of interest;
  2. You randomise them into two groups. One of the groups will be required to eat pizza on a regular basis; the other will not be allowed to eat pizza;
  3. You follow the 2 groups over a number of years and identify whether the rate of disease turns out to be lower in the pizza-eating group rather than the non-pizza-eating group;
  4. Again, you may want to correct for other differences in the 2 groups (though the need for this is largely eliminated by the randomisation process).

Clearly, for both logistic and time reasons, a randomised control study is completely unrealistic for studying the effects of pizza on disease prevention. However, in terms of reliability of results, case control studies are generally inferior to randomised control studies because of the potential for bias.

In case control studies the selection of the control group is extremely important, and it might be very easy to fall into the trap of inadvertently selecting people with an unusually high rate of eating pizzas. (If, for example, you surveyed people while standing outside a pizzeria). It’s also easy – by accident or design – for the researcher to get the answer they might want when asking a question. For example: “you eat a lot of pizza, don’t you?” might get a different response from “would you describe yourself as a regular pizza eater?”. Moreover, people simply might not have an accurate interpretation of their long-term eating habits. But most importantly, you are asking people with, for example, cancer of the colon whether they are regular pizza eaters. Quite plausibly this type of disease has quite a big effect on diet, and one can well imagine that pizzas are not advised by doctors. So although the pizza-eating question is probably intended to relate to the period prior to getting the disease, it’s possible that people with the disease are no longer tending to eat pizza, and respond accordingly.

Finally, even if biases are eliminated by careful execution of the study, there’s the possibility that the result is anyway misleading. It may be that although pizzas seem to give disease protection, it’s not the pizza itself that’s providing the protection, but something else that is associated with pizza eating. For example, regular pizza eating might just be an indicator of someone who simply has regular meals, which may be the genuine source of disease protection. There’s also the possibility that while the rates of pizza eating are lower among the individuals with the specified diseases, they are much higher among individuals with other diseases (heart problems, for example). This could have been identified in a randomised control study, but flies completely under the radar in a case-control study.

So, case-control studies are a bit of a minefield, with various potential sources of misleading results, and I would remain cautious about the life-saving effects of eating pizza.

And finally… like all statistical analysis, any conclusions made on the basis of sample results are only relevant to the wider population from which that sample was drawn. And since this study was based on Italians eating Italian pizzas, the authors conclude…

Extension of the apparently favorable effect of pizza on cancer risk in Italy to other types of diets and populations is therefore not warranted.

So, fill your boots at Domino’s Pizzas, but don’t rely on the fact that this will do much in the way of disease prevention.

 

 

No smoke without fire

No one seriously now doubts that cigarette smoking increases your risk of lung cancer and many other diseases, but when the evidence for a relationship between smoking and cancer was first presented in the 1950’s, it was strongly challenged by the tobacco industry.

The history of the scientific fight to demonstrate the harmful effects of smoking is summarised in this article. One difficulty from a statistical point of view was that the primary evidence based on retrospective studies was shaky, because smokers tend to give unreliable reports on how much they smoke. Smokers with illnesses tend to overstate how much they smoke; those who are healthy tend to understate their cigarette consumption. And these two effects lead to misleading analyses of historically collected data.

An additional problem was the difficulty of establishing causal relationships from statistical associations. Similar to the examples in a previous post, just because there’s a correlation between smoking and cancer, it doesn’t necessarily mean that smoking is a risk factor for cancer. Indeed, one of the most prominent statisticians of the time – actually of any time – Sir Ronald Fisher, wrote various scientific articles explaining how the correlations observed between smoking and cancer rates could easily be explained by the presents of lurking variables that induce spurious correlations.

At which point it’s worth noting a couple more ‘coincidences’: Fisher was a heavy smoker himself and also an advisor to the Tobacco Manufacturers Standing Committee. In other words, he wasn’t exactly neutral on the matter. But, he was a highly respected scientist, and therefore his scepticism carried considerable weight.

Eventually though, the sheer weight of evidence – including that from long-term prospective studies – was simply too overwhelming to be ignored, and governments fell into line with the scientific community in accepting that smoking is a high risk factor for various types of cancer.

An important milestone in that process was the work of another British statistician, Austin Bradford Hill. As well as being involved in several of the most prominent cases studies linking cancer to smoking, he also developed a set of 9 (later extended to 10) criteria for establishing a causal relationship between processes. Though still only guidelines, they provided a framework that is still used today for determining whether associated processes include any causal relationships. And by these criteria, smoking was clearly shown to be a risk factor for smoking.

Now, fast-forward to today and there’s a similar debate about global warming:

  1. Is the planet genuinely heating up or is it just random variation in temperatures?
  2. If it’s heating up, is it a consequence of human activity, or just part of the natural evolution of the planet?
  3. And then what are the consequences for the various bio- and eco-systems living on it?

There are correlations all over the place – for example between CO2 emissions and average global temperatures as described in an earlier post – but could these possibly just be spurious and not indicative of any causal relationships?  Certainly there are industries with vested interests who would like to shroud the arguments in doubt. Well, this nice article applies each of Bradford Hill’s criteria to various aspects of climate science data and establishes that the increases in global temperatures are undoubtedly caused by human activity leading to CO2 release in the atmosphere, and that many observable changes to biological and geographical systems are a knock-on effect of this relationship.

In summary: in the case of the planet, the smoke that we see <global warming> is definitely a consequence of the fire we stared <the increased amounts of CO2 released into the atmosphere>.

Killfie

I recently read that more than 250 people died between 2011 and 2017 taking selfies (so-called killfies). A Wikipedia entry gives a list of some of these deaths, as well as injuries, and categorises the fatalities as due to the following causes:

  • Transport
  • Electrocution
  • Fall
  • Firearm
  • Drowned
  • Animal
  • Other

If you have a macabre sense of humour it makes for entertaining reading while also providing you with useful life tips: for example, don’t take selfies with a walrus.

More detail on some of these incidents can also be found here.

Meanwhile, this article includes the following statistically-based advice:

Humanity is actually very susceptible to selfie death. Soon, you will be more likely to die taking a selfie than you are getting attacked by a shark. That’s not me talking: that’s statistical likelihood. Stay off Instagram and stay alive

Yes, worry less about sharks, but a bit more about Instagram. Thanks Statistics.

The original academic article which identified the more than 250 selfie deaths is available here. It actually contains some interesting statistics:

  • Men are more susceptible to death-by-selfie than women, even though women take more selfies;
  • Most deaths occur in the 20-29 age group;
  • Men were more likely to die taking high-risk selfies than women;
  • Most selfie deaths due to firearms occurred in the United States;
  • The highest number of selfie deaths is in India.

None of these conclusions seems especially surprising to me, except the last one. Why India? Have a think yourself why that might be before scrolling down:

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

There are various possible factors. Maybe it’s because the population in India is so high. Maybe people just take more selfies in India. Maybe the environment there is more dangerous. Maybe India has a culture for high risk-taking. Maybe it’s a combination of these things.

Or maybe… if you look at the academic paper I referred to above, the authors are based at Indian academic institutes and describe their methodology as follows:

We performed a comprehensive search for keywords such as “selfie deaths; selfie accidents; selfie mortality; self photography deaths; koolfie deaths; mobile death/accidents” from news reports to gather information regarding selfie deaths.

I have no reason to doubt the integrity of these scientists, but it’s easy to imagine that their knowledge of where to look in the media for reported selfie deaths was more complete for Indian sources than for those of other countries. In which case, they would introduce an unintentional bias in their results by accessing a disproportionate number of reports of deaths in India.


In conclusion: be sceptical about any statistical analysis. If the sampling is biased for any reason, the conclusions almost certainly will be as well.

Love it or hate it

A while ago I wrote a post about the practice of advertistics – the use, and more often misuse, of Statistics by advertising companies to promote their products. And I referenced an article in the Guardian which included a number of examples of advertistics. One of these examples was Marmite.

You probably know the line: Marmite – you either love it or hate it. That’s an advertisitic in itself. And almost certainly provably incorrect – I just have to find one person who’s indifferent to Marmite.

But I want to discuss a slightly different issue. This ‘love or hate Marmite’ theme has turned up as an advertistic for a completely different product…

DNAfit is one of a number of do-it-yourself DNA testing kits. Here’s what they say about themselves:

DNAfit helps you become the best possible version of yourself. We promise a smarter, easier and more effective solution to health and fitness, entirely unique to your DNA profile. Whatever your goal, DNAfit will ensure you live a longer, happier and healthier life.

And here’s the eminent statistician, er, Rio Ferdinand, to persuade you with statistical facts as to why you should sign up with DNAfit.

But where’s the Marmite?

Well, as part of a campaign that was purportedly setup to address a decline in Marmite sales, but was coincidentally promoted as an advertistic for the DNAfit testing kit, a scientific project was set up to find genetic markers that identify whether a person will be a lover or hater of Marmite. (Let’s ignore, for the moment, the fact that the easiest way to discover if a person is a ‘lover’ or ‘hater’ of Marmite is simply to ask them.)

Here’s a summary of what they did:

  • They recruited a sample of 261 individuals;
  • For each individual, they took a DNA sample;
  • They also questioned the individuals to determine whether they love or hate Marmite;
  • They then applied standard statistical techniques to identify a small number of genetic markers that separate the Marmite lovers from the haters. Essentially, they looked for a combination of DNA markers which were present in the ‘haters’, but absent in the ‘lovers’ (or vice versa).

Finally, the study was given a sheen of respectability through the publication of a white paper with various genetic scientists as authors.

But, here’s the typical reaction of another scientist on receiving a press release about the study:

Wow, sorry about the language there. So, what’s wrong?

The Marmite gene study is actually pretty poor science. One reason, as explained in this New Scientist article, is that there’s no control for environmental factors. For example, several members of a family might all love Marmite because the parents do and introduced their kids to it at a very early age. The close family connection will also mean that these individuals have similar DNA. So, you’ll find a set of genetic characteristics that each of these family members have, and they all also love Marmite. Conclusion – these are genetic markers for loving Marmite. Wrong: these are genetic markers for this particular family who, because they share meals together, all love Marmite.

I’d guess there are other factors too. A sample of 261 seems rather small to me. There are many possible genetic markers, and many, many more combinations of genetic markers. With so many options it’s almost certain that purely by chance in 261 individuals you can find one set of markers shared only by the ‘lovers’ and another set shared only by the ‘haters’. We’ve seen this stuff before: look at enough things and something unlikely is bound to occur just by chance. It’s just unlikely to happen again outside of the sample of individuals that took part in the study.

Moreover, there seems to have been no attempt at validating the results on an independent set of individuals.

Unfortunately for DNAfit and Marmite, they took the campaign one stage further and encouraged Marmite customers – and non-customers – to carry out their own DNA test to see if they were Marmite ‘lovers’ or ‘haters’ using the classification found in the genetic study. If only they’d thought to do this as part of the study itself. Because although the test claimed to be 99.98% accurate, rather many people who paid to be tested found they’d been wrongly classified.

One ‘lover’ who was classified as a ‘hater’ wrote:

I was genuinely upset when I got my results back. Mostly because, hello, I am a ‘lover’, but also because I feel like Marmite led me on with a cheap publicity tool and I fell for it. I feel dirty and used.

While a wrongly-classified ‘hater’ said:

I am somewhat offended! I haven’t touched Marmite since I was about eight because even just the thought of it makes me want to curl up into a ball and scrub my tounge.

Ouch! ‘Dirty and used’. ‘Scrub my tongue’. Not great publicity for either Marmite or DNAfit, and both companies seem to have dropped the campaign pretty quickly and deleted as many references to it as they were able.

Ah, the price of doing Statistics badly.


p.s. There was a warning in the ads about a misclassification rate higher than 0.02% but they just dismissed it as fake news…

 

 

You looking at me?

Statistics: helping you solve life’s more difficult problems…

You might have read recently – since it was in every news outlet here, here, here, here, here, here, and here for example – that recent research has shown that staring at seagulls inhibits them from stealing your food. This article even shows a couple of videos of how the experiment was conducted. The researcher placed a package of food some metres in front of her in the vicinity of a seagull. In one experiment she watched the bird and timed how long it took before it snatched the food. She then repeated the experiment, with the same seagull, but this time facing away from the seagull. Finally, she repeated this exercise with a number of different seagulls in different locations.

At the heart of the study is a statistical analysis, and there are several points about both the analysis itself and the way it was reported that are interesting from a wider statistical perspective:

  1. The experiment is a good example of a designed paired experiment. Some seagulls are more likely to take food than others regardless of whether they are being looked at or not. The experiment aims to control for this effect by using pairs of results from each seagull: one in which the seagull was stared at, the other where it was not. By using knowledge that the data are in pairs this way, the accuracy of the analysis is improved considerably. This makes it much more likely to identify a possible effect within the noisy data.
  2. To avoid the possibility that, for example, a seagull is more likely to take food quickly the second time, the order in which the pairs of experiments are applied is randomised for each seagull.
  3. Other factors are also controlled for in the analysis: the presence of other birds, the distance of the food, the presence of other people and so on.
  4. The original experiment involved 74 birds, but many were uncooperative and refused the food in one or other of the experiments. In the end the analysis is based on just 19 birds who took food both when being stared at and not. So even though results prove to be significant, it’s worth remembering that the sample on which results were based is very small.
  5. It used to be very difficult to verify the accuracy of a published statistical analysis. These days it’s almost standard for data and code to be published alongside the manuscript itself. This enables readers to both check the results and carry out their own alternative analyses. For this paper, which you can find in full here, the data and code are available here.
  6. If you look at the code it’s just a few lines from R. It’s notable that such a sophisticated analysis can be carried out with such simple code.
  7. At the risk of being pedantic, although most newspapers went with headlines like ‘Staring at seagulls is best way to stop them stealing your chips‘, that’s not really an accurate summary of the research at all. Clearly, a much better way to stop seagulls eating your food is not to eat in the vicinity of seagulls. (Doh!) But even aside from this nit-picking point, the research didn’t show that staring at seagulls stopped them ‘stealing your chips’. It showed that, on average, the seagulls that bother to steal your chips, do so more quickly when you are looking away. In other words, the headline should be:

If you insist on eating chips in the vicinity of seagulls, you’ll lose them quicker if you’re not looking at them

Guess that’s why I’m a statistician and not a journalist.

The issue of designed statistical experiments was something I also discussed in an earlier post. As I mentioned then, it’s an aspect of Statistics that, so far, hasn’t much been exploited in the context of sports modelling, where analyses tend to be based on historically collected data. But in the context of gambling, where different strategies for betting might be compared and contrasted, it’s likely to be a powerful approach. In that case, the issues of controlling for other variables – like the identity of the gambler or the stake size – and randomising to avoid biases will be equally important.

 

Data controversies

 

Some time ago I wrote about Mendel’s law of genetic inheritance, and how statistical analysis of Mendel’s data suggested his results were too good to be true. It’s not that his theory is wrong; it’s just that the data he provided as evidence for his theory seem to have been manipulated in such a way as to seem incontrovertible. Unfortunately the data lack the variation that Mendel’s own law would also imply should occur in measurements of that type, leading to the charge that the data had been manufactured or manipulated in some way.

Well, there’s a similar controversy about the picture at the top of this page.

The photograph, taken 100 years ago, was as striking at that time as the recent picture of a black hole, discussed in an earlier post, is today. However, this picture was taken with basic photographic equipment and telescopic lens and shows a total solar eclipse, as the moon passes directly between the Earth and the Sun.

A full story of the controversy is given here.

In summary: Einstein’s theory of general relativity describes gravity not as a force between two attracting masses – as is central to Newtonian physics – but as a curvature caused in space-time due to the presence of massive objects. All objects cause such curvature, but only those that are especially massive, such as stars and planets, will have much of an effect.

Einstein’s relativity model was completely revolutionary compared to the prevailing view of physical laws at the time. But although it explained various astronomical observations that were anomalous according to Newtonian laws, it had never been used to predict anomalous behaviour. The picture above, and similar ones taken at around the same time, changed all that.

In essence, blocking out the sun’s rays enabled dimmer and more distant stars to be accurately photographed. Moreover, if Einstein’s theory were correct, the photographic position of these stars should be slightly distorted because of the spacetime curvature effects of the sun. But the effect is very slight, and even Newtonian physics suggests some disturbance due to gravitational effects.

In an attempt to get photographic evidence at the necessary resolution, the British astronomer Arthur Eddington set up two teams of scientists – one on the African island of Príncipe, the other in Sobral, Brazil – to take photographs of the solar eclipse on 29 May, 1919. Astronomical and photographic equipment was much more primitive in those days, so this was no mean feat.

Anyway, to cut a long story short, a combination of poor weather conditions and other setbacks meant that the results were less reliable than were hoped for. It seems that the data collected at Príncipe, where Eddington himself was stationed, were inconclusive, falling somewhere between the Newton and Einstein model predictions. The data at Sobral were taken with two different types of telescope, with one set favouring the Newton view and the other Einstein’s. Eddington essentially combined the Einstein-favouring data from Sobral together with those from Príncipe and concluded that the evidence supported Einsteins relativistic model of the universe.

Now, in hindsight, with vast amounts of empirical evidence of many types, we know Einstein’s model to be fundamentally correct. But did Eddington selectively choose his data to support Einstein’s model?

There are different points of view, which hinge on Eddington’s motivation for dropping a subset of the Sobral data from his analysis. One point of view is that he wanted Einstein’s view to be correct, and therefore simply ignored the data that were less favourable. This argument is fuelled by political reasoning: it sarges that since Eddington was a Quaker, and therefore a pacifist, he wanted to support a German theory as a kind of post-war reconciliation.

The alternative point of view, for which there is some documentary evidence, is that the Sobral data which Eddington ignored had been independently designated as unreliable. Therefore, on proper scientific grounds, Eddington had behaved entirely correctly by excluding it from his analysis, and his subsequent conclusions favouring the Einstein model were entirely consistent with the scientific data and information he had available.

This issue will probably never be fully resolved, though in a recent review of several books on the matter, theoretical physicist Peter Coles (no relation) claims to have reanalysed the data given in the Eddington paper using modern statistical methods, and found no reason to doubt his integrity. I have no reason to doubt that point of view, but there’s no detail of the statistical analysis that was carried out.

What’s interesting though, from a statistical point of view, is how the interpretation of the results depends on the reason for the exclusion of a subset of the Sobral data. If your view is that Eddington knew their contents and excluded them on that basis, then his conclusions in favour of Einstein must be regarded as biased. If you accept that Eddington excluded these data a priori because of their unreliability, then his conclusions were fair and accurate.

Data are often treated as a neutral aspect of an analysis. But as this story illustrates, the choice of which data to include or exclude, and the reasons for doing so, may be factors which fundamentally alter the direction an analysis will take, and the conclusions it will reach.

 

 

 

1 in 562

Ever heard of the Fermi paradox? This phenomenon is named after the Italian physicist Enrico Fermi, and concerns the fact that though we’ve found no empirical evidence of extraterrestrial life, standard calculations based on our learned knowledge of the universe suggest that the probability of life elsewhere in our galaxy is very high. The theoretical side of the paradox is usually based on some variation of the Drake equation, which takes various known or estimated constants – like the number of observed stars in our galaxy, the estimated average number of planets per star, the proportion of these that are likely to be able to support life, and so on – and feeds them into an equation which calculates the expected number of alien civilisations in our galaxy.

Though there’s a lot of uncertainty about the numbers that feed into Drake’s equation, best estimates lead to an answer that suggests there should be millions of civilisations out there somewhere. And Fermi’s paradox points to the contrast between this number and the zero civilisations that we’ve actually observed.

Anyway, rather than try to go through any of this in greater detail, I thought I’d let this video do the explaining. And for fun, they suggest using the same technique to calculate the chances of you finding someone you are compatible with as a love partner.

Now, you probably don’t need me to explain all the limitations in this methodology, either for the evidence of alien life or for potential love partners with whom you are compatible. Though of course, the application to finding love partners is just for fun, right?

Well, yes and no. Here’s Rachel Riley of Countdown fame doing a barely-disguised publicity for eHarmony.

She uses pretty much the same methodology to show that you have…

… a 1 in 562 chance of finding love.

Rachel also gives some advice to help you improve those odds. First up:

… get to know your colleagues

<Smartodds!!! I know!!!>

But it’s maybe not as bad as it sounds; she’s suggesting your colleagues might have suitable friends for you to pair up with, rather than your colleagues being potential love-partners themselves.

Finally, I’ll let you think about whether the methodology and assumptions used in Rachel’s calculations make sense or not. And maybe even try to understand what the 1 in 562 answer actually means, especially as a much higher proportion of people actually do end up in relationships. The opposite of Fermi’s paradox!

By coincidence

dna2

In an earlier post I suggested we play a game. You’d pick a sequence of three outcomes of a coin toss, like THT. Then I’d pick a different triplet, say TTT. I’d then toss a coin repeatedly and whoever’s chosen triplet showed up first in the sequence would be the winner.

In the post I gave the following example…

H T H H H T T H T H …..

… and with that outcome and the choices above you’d have won since your selection of THT shows up starting on the 7th coin toss, without my selection of TTT showing up before.

The question I asked was who this game favoured. Assuming we both play as well as possible, does it favour

  1. neither of us, because we both get to choose and the game is symmetric? Or;
  2. you, because you get to choose first and have the opportunity to do so optimally? Or;
  3. me, because I get to see your choice before I have to make mine?

The answer turns out to be 3. I have a big advantage over you, if I play smart. We’ll discuss what that means in a moment.

But in terms of these possible answers, it couldn’t have been 2. Whatever you choose I could have chosen the exact opposite and by symmetry, since H and T are equally likely, our two triplets would have been equally likely to occur first in the sequence. So, if you choose TTT, I choose HHH. If you choose HHT, I choose TTH and so on. In this way I don’t have an advantage over you, but neither do you over me. So we can rule out 2 as the possible answer.

But I can actually play better than that and have an advantage over you, whatever choice you make. I play as follows:

  1. My first choice in the sequence is the opposite of your second choice.
  2. My second and third choices are equal to your first and second choices.

So, if you chose TTT, I would choose HTT. If you chose THT, I would choose TTH. And so on. It’s not immediately obvious why this should give me an advantage, but it does. And it does so for every choice you can make.

The complete set of selections you can make, the selections I will make in response, and the corresponding odds in my favour are given in the following table.

Your Choice My Choice My win odds
HHH THH 7:1
HHT THH 3:1
HTH HHT 2:1
HTT HHT 2:1
THH TTH 2:1
THT TTH 2:1
TTH HTT 3:1
TTT HTT 7:1

As you can see, your best choice is to go for any of HTH, HTT, THH, THT, but even then the odds are 2:1 in my favour. That’s to say, I’m twice as likely to win as you in those circumstances. My odds increase to 3:1 – I’ll win three times as often as you – if you choose HHT or TTH; and my odds are a massive 7:1 – I’ll win seven times as often as you – if you choose HHH or TTT.

So, even if you play optimally, I’ll win twice as often as you. But why should this be so? The probabilities aren’t difficult to calculate, but most are a little more complicated than I can reasonably include here. Let’s take the easiest example though. Suppose you choose HHH, in which I case I choose THH. I then start tossing the coins. It’s possible that HHH will be the first 3 coins in the sequence. That will happen with probability 1/2 x 1/2 x 1/2 =1/8. But if that doesn’t happen, then there’s no way you can win. Because the first time HHH appears in the sequence it will have had to have been preceded by a T (otherwise HHH has occurred earlier). In which case my THH occurs before your HHH. So, you would have won with probability 1/8, and therefore I win with probability 7/8, and my odds of winning are 7:1.

Like I say, the other cases – except when you choose TTT, which is identical to HHH, modulo switching H’s and T’s – are a little more complicated, but the principle is essentially the same every time.

By coincidence, this game was invented by Walter Penney (Penney -> penny, geddit?), who published it in the Journal of Recreational Mathematics in 1969. It’s interesting from a mathematical/statistical/game-theory point of view because it’s an example of a non-transitive game. For example, looking at the table above, HHT is inferior to THH; THH is inferior to TTH; TTH is inferior to HTT; and HTT is inferior to HHT. Which brings us full circle. So, there’s no overall optimal selection. Each can be beaten by another, which in turn can be beaten by a different choice again. This is why the second player has an advantage: they can always find a selection that will beat the first player’s. It doesn’t matter that their choice can also be beaten, because the first player has already made their selection and it wasn’t that one.

The best known example of a non-transitive game is Rock-Paper-Scissors. Rock beats scissors; scissors beats paper; paper beats rock. But in that case it’s completely deterministic – rock will always beat paper, for example. In the Penney coin tossing game, HTT will usually beat TTT, but occasionally it won’t. So, it’s perhaps better defined as a random non-transitive game.

The game also has connections with genetics. Strands of DNA are long chains composed of sequences of individual molecules known as nucleotides. Only four different types of nucleotide are found in DNA strands, and these are usually labelled A, T, G and C. The precise ordering of these nucleotides in the DNA strand effectively define a code that will determine the characteristics of the individual having that DNA.

It’s not too much of a stretch of the imagination to see that a long sequence of nucleotides A, T, G and C is not so very different – albeit with 4 variants, rather than 2 – from a sequence of outcomes just like those from the tosses of a coin. Knowing which combinations of the nucleotides are more likely to occur than others, and other combinatorial aspects of their arrangements, proves to be an important contribution of Statistics to the understanding of genetics and the development of genetic intervention therapies. Admittedly, it’s not a direct application of Penney’s game, but the statistical objectives and techniques required are not so very different.


Thanks to those of you who wrote to me with solutions to this problem, all of which were at least partially correct.

Woodland creatures

The hedgehog and the fox is an essay by philosopher Isaiah Berlin. Though published in 1993, the title is a reference to a fragment of a poem by the ancient Greek poet Archilochus. The relevant passage translates as:

… a fox knows many things, but a hedgehog one important thing.

Isaiah Berlin used this concept to classify famous thinkers: those whose ideas could be summarised by a single principle are hedgehogs; those whose ideas are more pragmatic, multi-faceted and evolving are foxes.

This dichotomy of approaches to thinking has more recently been applied in the context of prediction, and is the basis of the following short (less than 5-minute) video, kindly suggested to me by Richard.Greene@Smartodds.co.uk.

Watch and enjoy…

So, remarkably, in a study of the accuracy of individuals when making predictions, nothing made a difference: age, sex, political outlook… Except, ‘foxes’ are better predictors than ‘hedgehogs’: being well-versed in a single consistent philosophy is inferior to an adaptive and evolving approach to knowledge and its application.

The narrator, David Spiegelhalter, also summarises the strengths of a good forecaster as:

  1. Aggregation. They use multiple sources of information, are open to new knowledge and are happy to work in teams.
  2. Metacognition. They have an insight into how they think and the biases they might have, such as seeking evidence that simply confirms pre-set ideas.
  3. Humility. They have a willingness to acknowledge uncertainty, admit errors and change their minds. Rather than saying categorically what is going to happen, they are only prepared to give probabilities of future events.

(Could almost be a bible for a sports modelling company.)

These principles are taken from the book Future Babble by Dan Gardner, which looks like it’s a great read. The tagline for the book is ‘how to stop worrying and love the unpredictable’, which on its own is worth the cost of the book.


Incidentally, I could just have easily written a blog entry with David Spiegelhalter as part of my series of famous statisticians. Until recently he was the president of the Royal Statistical Society. He was also knighted in 2014 for his services to Statistics, and has numerous awards and honorary degrees.

His contributions to statistics are many, especially in the field of Medical Statistics.  Equally though, as you can tell from the above video, he is a fantastic communicator of statistical ideas. He also has a recent book out: The art of statistics: learning from data. I’d guess that if anyone wants to learn something about Statistics from a single book, this would be the place to go. I’ve just bought it, but haven’t read it yet. Once I do, if it seems appropriate, I’ll post a review to the blog.

Freddy’s story: part 1

This is a great story with a puzzle and an apparent contradiction at the heart of it, that you might like to think about yourself.

A couple of weeks ago Freddy.Teuma@smartodds.co.uk wrote to me to say that he’d been looking at the recent post which discussed a probability puzzle based on coin tossing, and had come across something similar that he thought might be useful for the blog. Actually, the problem Freddy described was based on an algorithm for optimisation using genetic mutation techniques, that a friend had contacted him about.

To solve the problem, Freddy did four smart things:

  1. He first simplified the problem to make it easier to tackle, while still maintaining its core elements;
  2. He used intuition to predict what the solution would be;
  3. He supported his intuition with mathematical formalism;
  4. He did some simulations to verify that his intuition and mathematical reasoning were correct.

This is exactly how a statistician would approach both this problem and problems of greater complexity.

However… the pattern of results Freddy observed in the simulations contradicted what his intuition and mathematics had suggested would happen, and so he adjusted his beliefs accordingly. And then he wrote to me.

This is the version of the problem that Freddy had simplified from the original…

Suppose you start with a certain amount of money. For argument’s sake, let’s say it’s £100. You then play several rounds of a game. At each round the rules are as follows:

  1. You toss a fair coin (Heads and Tails each have probability 1/2).
  2. If the coin shows Heads, you lose a quarter of your current amount of money and end up with 3/4 of what you had at the start of the round.
  3. If the coin shows Tails, you win a quarter of your current amount of money and end up with 5/4 of what you had at the start of the round.

For example, suppose your first 3 tosses of the coin are Heads, Tails, Heads. The money you hold goes from £100, to £75 to £93.75 to £70.3125.

Now, suppose you play this game for a large number of rounds. Again, for argument’s sake, let’s say it’s 1000 rounds. How much money do you expect to have, on average, at the end of these 1000 rounds?

Have a think about this game yourself, and see what your own intuition suggests before scrolling down.

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

Freddy’s reasoning was as follows. In each round of the game I will lose or gain 25% of my current amount of money with equal probability. So, if I currently have £100, then at the end of the next round I will have either £75 or £125 with equal probability. And the average is still £100. This reasoning is true at each round of the game. And so, after any number of rounds, including 1000, I’d expect to have exactly the same amount of money as when I started: £100.

But when Freddy simulated the process, he found a different sort of behaviour. In each of his simulations, the money held after 1000 rounds was very close to zero, suggesting that the average is much smaller than £100.

I’ve taken the liberty of doing some simulations myself: the pattern of results in 16 repeats of the game, each time up to 1000 rounds,  is shown in the following figure.

Each panel of the figure corresponds to a repeat of the game, and in each repeat I’ve plotted a red trace showing how much money I hold after each round of the game.  In each case you can see that I start with £100, there’s then a bit of oscillation – more in some of the realisations than in others, due to random variation – but in all cases the amount of money I have hits something very close to zero somewhere before 250 rounds and then stays there right up to 1000 rounds.

So, there is indeed a conflict between Freddy’s intuition and the picture that these simulations provide.

What’s going on?

I’ll leave you to think about it for a while, and write with my own explanation and discussion of the problem in a future post. If you’d like to write to me to explain what you think is happening, I’d be very happy to hear from you.

Obviously, I’m especially grateful to Freddy for having sent me the problem in the first place, and for agreeing to let me write a post about it.


Update: if you’d like to run the simulation exercise yourself, just click the ‘run’ button in the following window. This will simulate the game for 1000 rounds, starting with £100. The graph will show you how much money you hold after each round of the game, while if you toggle to the console window it will tell you how much money you have after the 1000th round (to the nearest £0.01). This may not work in all browsers, but seems to work ok in Chrome. You can repeat the experiment simply by clicking ‘Run’ again. You’re likely to get a different graph each time because of the randomness in the simulations. But what about the final amount? Does that also change? And what does it suggest about Freddy’s reasoning that the average amount should stay equal to £100?

game_sim<-function(n_rounds=1000, money_start=100){ require(ggplot2) money<-c() money[1]<-money_start for(i in 2:(n_rounds)){ money[i]<-money[i-1]*sample(c(.75,1.25),1) } m<-data.frame(round=1:n_rounds,money=money) cat('Money in pounds after ',n_rounds, ' rounds is ',round(money[n_rounds],2)) ggplot(aes(x=round,y=money),data=m)+geom_line(color='red')+ ggtitle('Money') } game_sim()