Cube-shaped poo

Do you like pizza? If so, I’ve got good and bad news for you.

The good news is that the 2019 Ig Noble prize winner in the category of medicine is Silvano Gallus, who received the award for…

… collecting evidence that pizza might protect against illness and death…

The bad news, for most of you, is that this applies…

…if the pizza is made and eaten in Italy.

Obviously, it’s a bit surprising that pizza can be considered a health food. But if you accept that, it’s also a bit surprising that it has to be Italian pizza. So, what’s going on?

The Ig Nobel prizes are a satirical version of the Nobel prizes. Here’s the Wikipedia description:

The Ig Nobel Prize (/ˌɪɡnˈbɛl/ IG-noh-BEL) is a satiric prize awarded annually since 1991 to celebrate ten unusual or trivial achievements in scientific research, its stated aim being to “honor achievements that first make people laugh, and then make them think.” The name of the award is a pun on the Nobel Prize, which it parodies, and the word ignoble.

As such, the prize is awarded for genuine scientific research, but for areas of research that are largely incidental to human progress and understanding of the universe. For example, this year’s prize in the field of physics went to a group of scientists for…

…studying how, and why, wombats make cube-shaped poo.

It’s in this context that Silvano Gallus won his award. But although the Ig Noble award says something about the irrelevance of the subject matter, it’s not intended as a criticism of the quality of the underlying research. Gallus’s work with various co-authors (all Italian) was published as an academic paper ‘Does Pizza Protect Against Cancer‘ in the International Journal of Cancer. This wouldn’t happen if the work didn’t have scientific merit.

Despite this, there are reasons to be cautious about the conclusions of the study. The research is based on a type of statistical experimental design known as a case-control study. This works as follows. Suppose, for argument’s sake, you’re interested in testing the effect of pizzas on the prevention of certain types of disease. You first identify a group of patients having the disease and ask them about their pizza-eating habits. You then also find a group of people who don’t have the disease and ask them about their pizza-eating habits. You then check whether the pizza habits are different in the two groups.

Actually, it’s a little more complicated than that. It might be that age or gender or something else is also different in the two groups, so you also need to correct for these effects as well. But the principle is essentially just to see whether the tendency to eat pizza is greater in the control group – if so, you conclude that pizza is beneficial for the prevention of the specified disease. And on this basis, for a number of different cancer-types, Silvano Gallus and his co-authors found the proportion of people eating pizzas occasionally or regularly to be higher in the control group than in the case group.

Case-control studies are widely used in medical and epidemiological studies because they are quick and easy to implement. The more rigorous ‘randomised control study’ would work as follows:

  1. You recruit a number of people for the study, none of whom have the disease of interest;
  2. You randomise them into two groups. One of the groups will be required to eat pizza on a regular basis; the other will not be allowed to eat pizza;
  3. You follow the 2 groups over a number of years and identify whether the rate of disease turns out to be lower in the pizza-eating group rather than the non-pizza-eating group;
  4. Again, you may want to correct for other differences in the 2 groups (though the need for this is largely eliminated by the randomisation process).

Clearly, for both logistic and time reasons, a randomised control study is completely unrealistic for studying the effects of pizza on disease prevention. However, in terms of reliability of results, case control studies are generally inferior to randomised control studies because of the potential for bias.

In case control studies the selection of the control group is extremely important, and it might be very easy to fall into the trap of inadvertently selecting people with an unusually high rate of eating pizzas. (If, for example, you surveyed people while standing outside a pizzeria). It’s also easy – by accident or design – for the researcher to get the answer they might want when asking a question. For example: “you eat a lot of pizza, don’t you?” might get a different response from “would you describe yourself as a regular pizza eater?”. Moreover, people simply might not have an accurate interpretation of their long-term eating habits. But most importantly, you are asking people with, for example, cancer of the colon whether they are regular pizza eaters. Quite plausibly this type of disease has quite a big effect on diet, and one can well imagine that pizzas are not advised by doctors. So although the pizza-eating question is probably intended to relate to the period prior to getting the disease, it’s possible that people with the disease are no longer tending to eat pizza, and respond accordingly.

Finally, even if biases are eliminated by careful execution of the study, there’s the possibility that the result is anyway misleading. It may be that although pizzas seem to give disease protection, it’s not the pizza itself that’s providing the protection, but something else that is associated with pizza eating. For example, regular pizza eating might just be an indicator of someone who simply has regular meals, which may be the genuine source of disease protection. There’s also the possibility that while the rates of pizza eating are lower among the individuals with the specified diseases, they are much higher among individuals with other diseases (heart problems, for example). This could have been identified in a randomised control study, but flies completely under the radar in a case-control study.

So, case-control studies are a bit of a minefield, with various potential sources of misleading results, and I would remain cautious about the life-saving effects of eating pizza.

And finally… like all statistical analysis, any conclusions made on the basis of sample results are only relevant to the wider population from which that sample was drawn. And since this study was based on Italians eating Italian pizzas, the authors conclude…

Extension of the apparently favorable effect of pizza on cancer risk in Italy to other types of diets and populations is therefore not warranted.

So, fill your boots at Domino’s Pizzas, but don’t rely on the fact that this will do much in the way of disease prevention.



The China syndrome

In a couple of earlier posts I’ve mentioned how statistical analyses have sometimes been used to demonstrate that results in published analyses are ‘too good to be true’. One of these cases concerned Mendel’s laws of genetic inheritance. Though the laws have subsequently been shown to be unquestionably true, Mendel’s results on pea experiments were insufficiently random to be credible. The evidence strongly suggests that Mendel tweaked his results to fit the laws he believed to be true. He just didn’t understand enough about statistics to realise that the very laws he wanted to establish also implied sizeable random variation around predicted results, and the values he reported were much too close to the predicted values to be plausible.

As discussed in a recent academic article, a similar issue has been discovered in respect of official Chinese figures for organ donation. China has recently come under increasing international pressure to discontinue its practice of using organs of dead prisoners for transplants. One issue was consent – did prisoners consent to the use of their organs before their death? But a more serious issue was with respect to possible corruption and even the possibility that  some prisoners were executed specifically to make their organs available.

Anyway, since 2010 China has made efforts to discontinue this practice, replacing it with a national system of voluntary organ donation. Moreover, they announced that from 2015 onwards only hospital-based voluntary organ donations would be used for transplants.  And as evidence of the success of this program, two widely available datasets published respectively by the China Organ Transplant Response System (COTRS)  and the Red Cross Society of China, show rapid growth in the numbers of voluntary organ donations, which would more than compensate for the cessation of the practice of donations from prisoners.

Some of the yearly data counts from the COTRS database are shown in this figure taken from the report references above. The actual data are shown by points (or triangles and crosses); the curves have been artificially added to show the general trend in the observed data. Clearly, for each of the count types, one can observe a rapid growth rate in the number of donations.

But… here’s the thing… look at how closely the smooth curves approximate the data values. The fit is almost perfect for each of the curves. And there’s a similar phenomenon for other data, including the Red Cross data. But when similar relationships are looked at for data from other countries, something different happens: the trend is generally upwards, as in this figure, but the data are much more variable around the trend curve.

In summary, it seems much more likely that the curves have been chosen, and the data chosen subsequently to fit very closely to the curves. But just like Mendel’s pea data, this has been done without a proper awareness that nature is bound to lead to substantial variations around an underlying law. However, unlike Mendel, who presumably just invented numbers to take shortcuts to establish a law that was true, the suspicion remains that neither the data nor the law are valid in the case of the Chinese organ donation numbers.

A small technical point for those of you that might be interested in such things. The quadratic curves in the above plot were fitted in the report by the method of simple least squares, which aims to find the quadratic curve which minimises the overall distance between the points and the curve. As a point of principle, I’d argue this is not very sensible. When the counts are bigger, one would expect to get more variation, so we’d probably want to downweight the value of the variation for large counts, and increase it for the lower counts. In other words, we’d expect the curve to fit better in the early years and worse in the later years, and we should take that into account when fitting the curve. In practice, the variations around the curves are so small, the results obtained by doing things this way are likely to be almost identical. So, it’s just a point of principle more than anything else. But still, in an academic paper which purports to use the best available statistics to discredit the claim made by a national government, it would probably be best to make sure you really are using the most appropriate statistical methods for the analysis.

1 in 562

Ever heard of the Fermi paradox? This phenomenon is named after the Italian physicist Enrico Fermi, and concerns the fact that though we’ve found no empirical evidence of extraterrestrial life, standard calculations based on our learned knowledge of the universe suggest that the probability of life elsewhere in our galaxy is very high. The theoretical side of the paradox is usually based on some variation of the Drake equation, which takes various known or estimated constants – like the number of observed stars in our galaxy, the estimated average number of planets per star, the proportion of these that are likely to be able to support life, and so on – and feeds them into an equation which calculates the expected number of alien civilisations in our galaxy.

Though there’s a lot of uncertainty about the numbers that feed into Drake’s equation, best estimates lead to an answer that suggests there should be millions of civilisations out there somewhere. And Fermi’s paradox points to the contrast between this number and the zero civilisations that we’ve actually observed.

Anyway, rather than try to go through any of this in greater detail, I thought I’d let this video do the explaining. And for fun, they suggest using the same technique to calculate the chances of you finding someone you are compatible with as a love partner.

Now, you probably don’t need me to explain all the limitations in this methodology, either for the evidence of alien life or for potential love partners with whom you are compatible. Though of course, the application to finding love partners is just for fun, right?

Well, yes and no. Here’s Rachel Riley of Countdown fame doing a barely-disguised publicity for eHarmony.

She uses pretty much the same methodology to show that you have…

… a 1 in 562 chance of finding love.

Rachel also gives some advice to help you improve those odds. First up:

… get to know your colleagues

<Smartodds!!! I know!!!>

But it’s maybe not as bad as it sounds; she’s suggesting your colleagues might have suitable friends for you to pair up with, rather than your colleagues being potential love-partners themselves.

Finally, I’ll let you think about whether the methodology and assumptions used in Rachel’s calculations make sense or not. And maybe even try to understand what the 1 in 562 answer actually means, especially as a much higher proportion of people actually do end up in relationships. The opposite of Fermi’s paradox!

Woodland creatures

The hedgehog and the fox is an essay by philosopher Isaiah Berlin. Though published in 1993, the title is a reference to a fragment of a poem by the ancient Greek poet Archilochus. The relevant passage translates as:

… a fox knows many things, but a hedgehog one important thing.

Isaiah Berlin used this concept to classify famous thinkers: those whose ideas could be summarised by a single principle are hedgehogs; those whose ideas are more pragmatic, multi-faceted and evolving are foxes.

This dichotomy of approaches to thinking has more recently been applied in the context of prediction, and is the basis of the following short (less than 5-minute) video, kindly suggested to me by

Watch and enjoy…

So, remarkably, in a study of the accuracy of individuals when making predictions, nothing made a difference: age, sex, political outlook… Except, ‘foxes’ are better predictors than ‘hedgehogs’: being well-versed in a single consistent philosophy is inferior to an adaptive and evolving approach to knowledge and its application.

The narrator, David Spiegelhalter, also summarises the strengths of a good forecaster as:

  1. Aggregation. They use multiple sources of information, are open to new knowledge and are happy to work in teams.
  2. Metacognition. They have an insight into how they think and the biases they might have, such as seeking evidence that simply confirms pre-set ideas.
  3. Humility. They have a willingness to acknowledge uncertainty, admit errors and change their minds. Rather than saying categorically what is going to happen, they are only prepared to give probabilities of future events.

(Could almost be a bible for a sports modelling company.)

These principles are taken from the book Future Babble by Dan Gardner, which looks like it’s a great read. The tagline for the book is ‘how to stop worrying and love the unpredictable’, which on its own is worth the cost of the book.

Incidentally, I could just have easily written a blog entry with David Spiegelhalter as part of my series of famous statisticians. Until recently he was the president of the Royal Statistical Society. He was also knighted in 2014 for his services to Statistics, and has numerous awards and honorary degrees.

His contributions to statistics are many, especially in the field of Medical Statistics.  Equally though, as you can tell from the above video, he is a fantastic communicator of statistical ideas. He also has a recent book out: The art of statistics: learning from data. I’d guess that if anyone wants to learn something about Statistics from a single book, this would be the place to go. I’ve just bought it, but haven’t read it yet. Once I do, if it seems appropriate, I’ll post a review to the blog.

Revel in the amazement

In an earlier post I included the following table:

As I explained, one of the columns contains the genuine land areas of each country, while the other is fake. And I asked you which is which.

The answer is that the first column is genuine and the second is fake. But without a good knowledge of geography, how could you possibly come to that conclusion?

Well, here’s a remarkable thing. Suppose we take just the leading digit of each  of the values. Column 1 would give 6, 2, 2, 1,… for the first few countries, while column 2 would give 7, 9, 3, 3,… It turns out that for many naturally occurring phenomena, you’d expect the leading digit to be 1 on around 30% of occasions. So if the actual proportion is a long way from that value, then it’s likely that the data have been manufactured or manipulated.

Looking at column 1 in the table, 5 out of the 20 countries have a population with leading digit 1; that’s 25%. In column 2, none do; that’s 0%. Even 25% is a little on the low side, but close enough to be consistent with 30% once you allow for discrepancies due to random variations in small samples. But 0% is pretty implausible. Consequently, column 1 is consistent with the 30% rule, while column 2 is not, and we’d conclude – correctly – that column 2 is faking it.

But where does this 30% rule come from? You might have reasoned that each of the digits 1 to 9 were equally likely – assuming we drop leading zeros – and so the percentage would be around 11% for a leading digit of 1, just as it would be for any of the other digits. Yet that reasoning turns out to be misplaced, and the true value is around 30%.

This phenomenon is a special case of something called Benford’s law, named after the physicist Frank Benford who first formalised it. (Though it had also been noted much earlier by the astronomer Simon Newcomb). Benford’s law states that for many naturally occurring datasets, the probability that the leading digit of a data item is 1 is equal to 30.1%. Actually, Benford’s law goes further than that, and gives the percentage of times you’d get a 2 or a 3 or any of the digits 1-9 as the leading digit. These percentages are shown in the following table.

Leading Digit 1 2 3 4 5 6 7 8 9
Frequency 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%

For those of you who care about such things, these percentages are log(2/1), log(3/2), log(4/3) and so on up to log(10/9), where log here is logarithm with respect to base 10.

But does Benford’s law hold up in practice? Well, not always, as I’ll discuss below. But often it does. For example, I took a dataset giving the altitudes of a large set of football stadiums around the world. I discarded a few whose altitude is below sea level, but was still left with over 13,000 records. I then extracted the leading digit of each of the altitudes (in metres)  and plotted a histogram of these values. This is just a plot of the percentages of occasions each value occurred. These are the blue bars in the following diagram. I then superimposed the predicted proportions from Benford’s law. These are the black dots.


The agreement between the observed percentages and those predicted by Benford’s law is remarkable. In particular, the observed percentage of leading digits equal to 1 is almost exactly what Benford’s law would imply. I promise I haven’t cheated with the numbers.

As further examples, there are many series of mathematically generated numbers for which Benford’s law holds exactly.

These include:

  • The Fibonacci series: 1, 1, 2, 3, 5, 8, 13, …. where each number is obtained by summing the 2 previous numbers in the series.
  • The integer powers of two: 1, 2, 4, 8, 16, 32, …..
  • The iterative series obtained by starting with any number and successively multiplying by 3. For example, starting with 7, we get: 7, 21, 63, 189,….

In each of these cases of infinite series of numbers, exactly 30.1% will have leading digit equal to 1; exactly 17.6% will have leading digit equal to 2, and so on.

And there are many other published examples of data fitting Benford’s law (here, here, here… and so on.)

Ok, at this point you should pause to revel in the amazement of this stuff. Sometimes mathematics, Statistics and probability come together in a way to explain naturally occurring phenomena that is so surprising and shockingly elegant it takes your breath away.

So, when does Benford’s law work. And why?

It turns out there are various ways of explaining Benford’s law, but none of them – at least as far as I can tell – is entirely satisfactory. All of them require a leap of faith somewhere to match the theory to real-life. This view is similarly expressed in an academic article, which concludes:

… there is currently no unified approach that simultaneously explains (Benford’s law’s) appearance in dynamical systems, number theory, statistics, and real-world data.

Despite this, the various arguments used to explain Benford’s law do give some insight into why it might arise naturally in different contexts:

  1. If there is a law of this type, Benford’s law is the only one that works for all choices of scale. The decimal representation of numbers is entirely arbitrary, presumably deriving from the fact that humans, generally, have 10 fingers. But if we’d been born with 8 fingers, or chosen to represent numbers anyway in binary, or base 17, or something else, you’d expect a universal law to be equally valid, and not dependent on the arbitrary choice of counting system. If this is so, then it turns out that Benford’s law, adapted in the obvious way to the choice of scale, is the only one that could possibly hold. An informal argument as to why this should be so can be found here.
  2. If the logarithm of the variable under study has a distribution that is smooth and roughly symmetric – like the bell-shaped normal curve, for example – and is also reasonably well spread out, it’s easy to show that Benford’s law should hold approximately. Technically, for those of you who are interested, if X is the thing we’re measuring, and if log X has something like a normal distribution with a variance that’s not too small, then Benford’s law is a good approximation for the behaviour of X. A fairly readable development of the argument is given here. (Incidentally, I stole the land area of countries example directly from this reference.)

But in the first case, there’s no explanation as to why there should be a universal law, and indeed many phenomena – both theoretical and in nature – don’t follow Benford’s law. And in the second case, except for special situations where the normal distribution has some kind of theoretical justification as an approximation, there’s no particular reason why the logarithm of the observations should behave in the required way. And yet, in very many cases – like the land area of countries or the altitude of football stadiums – the law can be shown empirically to be a very good approximation to the truth.

One thing which does emerge from these theoretical explanations is a better understanding of when Benford’s law is likely to apply and when it’s not. In particular, the argument only works when the logarithm of the variable under study is reasonably well spread out. What that means in practice is that the variable itself needs to cover several orders of magnitude: tens, hundreds, thousands etc. This works fine for something like the stadium altitudes, which vary from close to sea-level up to around 4,000 metres, but wouldn’t work for total goals in football matches, which are almost always in the range 0 to 10, for example.

So, there are different ways of theoretically justifying Benford’s law, and empirically it seems to be very accurate for different datasets which cover orders of magnitude. But does it have any practical uses? Well, yes: applications of Benford’s law have been made in many different fields, including…

Finally, there’s also a version of Benford’s law for the second digit, third digit and so on. There’s an explanation of this extension in the Wikipedia link that I gave above. It’s probably not easy to guess exactly what the law might be in these cases, but you might try and guess how the broad pattern of the law changes as you move from the first to the second and to further digits.

Thanks to those of you wrote to me after I made the original post. I don’t think it was easy to guess what the solution was, and indeed if I was guessing myself, I think I’d have been looking for a uniformity in the distribution of the digits, which turns out to be completely incorrect, at least for the leading digit. Even though I’ve now researched the answer myself, and made some sense of it, I still find it rather shocking that the law works so well for an arbitrary dataset like the stadium altitudes. Like I say: revel in the amazement.

On top of the world

I’ll be honest, usually I try to find a picture that fits in with the statistical message I’m trying to convey. But occasionally I see a picture and then look for a statistical angle to justify its inclusion in the blog. This is one of those occasions. I don’t know what your mental image of the top of Everest is like, but until now mine wasn’t something that resembled the queue for the showers at Glastonbury.

Anyway, you might have read that this congestion to reach the summit of Everest is becoming increasingly dangerous. In the best of circumstances the conditions are difficult, but climbers are now faced with a wait of several hours at very high altitude with often unpredictable weather. And this has contributed to a spate of recent deaths.

But what’s the statistical angle? Well, suppose you wanted to make the climb yourself. What precautions would you take? Obviously you’d get prepared physically and make sure you had the right equipment. But beyond that, it turns out that a statistical analysis of relevant data, as the following video shows, can both improve your chances of reaching the summit and minimise your chances of dying while doing so.

This video was made by Dr Melanie Windridge, and is one of a series she made under the project title “Summiting the Science of Everest”. Her aim was to explore the various scientific aspects associated with a climb of Everest, which she undertook in Spring 2018. And one of these aspects, as set out in the video, is the role of data analysis in planning. The various things to be learned from the data include:

  1. Climbing from the south Nepal side is less risky than from the north Tibet side. This is explained by the steeper summit on the south side making descent quicker in case of emergency.
  2. Men and women have equally successful at completing summits of Everest. And they also have similar death rates.
  3. Age is a big factor: over forties are less likely to make the summit; over sixties have a much higher death rate.
  4. Most deaths occur in the icefall regions of the mountain.
  5. Many deaths occur during descent.
  6. Avalanches are a common cause of death. Though they are largely unpredictable, they are less frequent in Spring. Moreover, walking through the icefall regions early in the morning also reduces avalanche risk.
  7. The distribution of summit times for climbers who survive is centred around 9 a.m., whereas for those who subsequently die during the descent it’s around 2 p.m. In other words, it’s safest to aim to arrive at the summit relatively early in the morning.

Obviously, climbing Everest will never be risk free – the death rate of people making the summit is, by some counts, around 6.5%. But intelligent use of available data can help minimise the risks. Statistics, in this context, really can be a matter of life or death.

Having said that, although Dr Melanie seemed reassured that the rate of deaths of climbers is decreasing, here’s a graphical representation of the data showing that the actual number of deaths – as opposed to the rate of deaths – is generally increasing with occasional spikes.

Looking on the bright side of things though, Everest is a relatively safe mountain to climb: the death rate for climbers on Annapurna, also in the Himalayas, is around 33%!

In light of all this, if you prefer your climbs to the top of the world to be risk free, you might try scaling the Google face (though I recommend turning the sound off first):

While for less than the prices of a couple of beers you can get a full-on VR experience as previewed below:

Finally, if you’re really interested in the statistics of climbing Everest, there’s a complete database of all attempted climbs available here.

Do I feel lucky?

Ok, I’m going call it…

This is, by some distance:

The Best Application of Statistics in Cinematic History‘:

It has everything: the importance of good quality data; inference; hypothesis testing; prediction; decision-making; model-checking. And Clint Eastwood firing rounds off a 44 Magnum while eating a sandwich.

But, on this subject, do you feel lucky? (Punk)

Richard Wiseman is Professor in Public Understanding of Psychology at the University of Hertfordshire. His work touches on many areas of human psychology, and one aspect he has studied in detail is the role of luck. A summary of his work in this area is contained in his book The Luck Factor.

This is from the book’s Amazon description:

Why do some people lead happy successful lives whilst other face repeated failure and sadness? Why do some find their perfect partner whilst others stagger from one broken relationship to the next? What enables some people to have successful careers whilst others find themselves trapped in jobs they detest? And can unlucky people do anything to improve their luck – and lives?

Richard’s work in this field is based over many years of research involving a study group of 400 people. In summary, what he finds, perhaps unsurprisingly, is that people aren’t born lucky or unlucky, even if their perception is that they are. Rather, our attitude to life generally determines how the lucky and unlucky events we experience determine the way our lives pan out. In other words, we really do make our own luck.

He goes on to identify four principles we can adopt in order to make the best out of the opportunities (and difficulties) life bestows upon us:

  1. Create and notice chance opportunities;
  2. Listen to your intuition;
  3. Create self-fulfilling prophesies via positive expectations;
  4. Adopt a resilient attitude that transforms bad luck into good.

In summary: if you have a positive outlook on life, you’re likely to make the best of the good luck that you have, while mitigating as well as is possible  against the bad luck.

But would those same four principles work well for a sports modelling company? They could probably adopt 1, 3 and 4 as they are, perhaps reinterpreted as:

1. Seek out positive value trading opportunities wherever possible.

3. Build on success. Keep a record of what works well, both in trading and in the company generally, and do more of it.

4. Don’t confuse poor results with bad luck. Trust your research.

Principle 2 is a bit more problematic: much better to stress the need to avoid the trap of following instinct, when models and data suggest a different course of action. However, I think the difficulty is more to do with the way this Principle has been written, rather than what’s intended. For example, I found this description in a review of the book:

Lucky people actively boost their intuitive abilities by, for example… learning to dowse.

Learning to dowse!

But this isn’t what Wiseman meant at all. Indeed, he writes:

Superstition doesn’t work because it is based on outdated and incorrect thinking. It comes from a time when people thought that luck was a strange force that could only be controlled by magical rituals and bizarre behaviors.

So, I don’t think he’s suggesting you start wandering around with bits of wood in a search for underground sources of water. Rather, I think he’s suggesting that you be aware of the luck in the events around you, and be prepared to act on them. But in the context of a sports modelling company, it would make sense to completely replace reference to intuition with data and research. So…

2. Invest in data and research and develop your trading strategy accordingly.

And putting everything together:

  1. Seek out positive value trading opportunities wherever possible.
  2. Invest in data and research and develop your trading strategy accordingly.
  3. Build on success. Keep a record of what works well, both in trading and in the company generally, and do more of it.
  4. Don’t confuse poor results with bad luck. Trust your research.

And finally, what’s that you say?  “Go ahead, make my day.” Ok then…


Calling BS

You have to be wary of newspaper articles published on 1 April, but I think this one is genuine. The Guardian on Monday contained a report about scientific research into bullshit. Or more specifically, a scientific/statistical study into the demographics of bullshitting.

Now, to make any sense of this, it’s important first to understand what bullshit is.  Bullshit is different from lying. The standard treatise in this field is ‘On Bullshit‘ by Harry Frankfurt. I’m not kidding. He writes:

It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction

In other words, bullshitting is providing a version of events that gives the impression you know what you are talking about, when in fact you don’t.

Unfortunately, standard dictionaries tend to define bullshitting as something like ‘talking nonsense’, though this is – irony alert – bullshit. This article explains why and includes the following example. Consider the phrase

Hidden meaning transforms unparalleled abstract beauty.

It argues that since the sentence is grammatically correct, but intellectually meaningless, it is an example of bullshit. On the other hand, the same set of words in a different order, for example

Unparalleled transforms meaning beauty hidden abstract.

are simply nonsense. Since they lack grammatical structure, the author isn’t bullshitting. He’s just talking garbage.

So, bullshit is different from lying in that the bullshitter will generally not know the truth; and it’s different from nonsense in that it has specific intent to deceive or misdirect.

But back to the Guardian article. The statistical study it refers to reveals a number of interesting outcomes:

  • Boys bullshit more than girls;
  • Children from higher socioeconomic backgrounds tend to bullshit more than those from poorer backgrounds;
  • North Americans bullshit the most (among the countries studied);
  • Bullshitters tend to perceive themselves as self-confident and high achievers.

If only I could think of an example of a self-confident, North American male from a wealthy background with a strong tendency to disseminate bullshit in order to illustrate these points.

But what’s all this got to do with Statistics? Well, it cuts both ways. First, the cool logic of Statistics can be used to identify and correct bullshit. Indeed, if you happen to study at the University of Washington, you can enrol for the course ‘Calling Bullshit: Data Reasoning in a Digital World‘, which is dedicated to the subject. The objectives for this course, as listed in its syllabus, are that after the course you should be able to:

  • Remain vigilant for bullshit contaminating your information diet.
  • Recognize said bullshit whenever and wherever you encounter it.
  • Figure out for yourself precisely why a particular bit of bullshit is bullshit.
  • Provide a statistician or fellow scientist with a technical explanation of why a claim is bullshit.
  • Provide your crystals-and-homeopathy aunt or casually racist uncle with an accessible and persuasive explanation of why a claim is bullshit.

I especially like the fact that after following this course you’ll be well-equipped to take on both the renegade hippy and racist wings of your family.

So that’s the good side of things. On the bad side, it’s extremely easy to use Statistics to disseminate bullshit. Partly because not everyone is sufficiently clued-up to really understand statistical concepts and to be critical when confronted with them; and partly because, even if you have a good statistical knowledge and are appropriately sceptical, you’re still likely to have to rely on the accuracy of the analysis, without access to the data on which they were based.

For example, this article, which is an interesting read on the subject of Statistics and bullshit, discusses a widely circulated fact, attributed to the Crime Statistics Bureau of San Francisco, that:

81% of white homicide victims were killed by blacks

Except, it turns out, that the Crime Statistics Bureau of San Francisco doesn’t exist and FBI figures actually suggest that 80% of white murder victims were killed by other white people. So, it’s a bullshit statement attributed to  a bullshit organisation. But with social media, the dissemination of these mis-truths becomes viral, and it becomes impossible to enable corrections with actual facts. Indeed, the above statement was included in an image posted to twitter by Donald Trump during his election campaign: full story here. And that tweet alone got almost 7000 retweets. So though, using reliable statistics, the claim is easily disproved, the message is already spread and the damage done.

So, welcome to Statistics: helping, and helping fight, bullshit.




Mr. Wrong


As a footnote to last week’s post ‘How to be wrong‘, I mentioned that Daniel Kahneman had been shown to be wrong by using unreliable research in his book ‘Thinking, Fast and Slow’. I also suggested that he had tried to deflect blame for this oversight, essentially putting all of the blame on the authors of the work which he cited.

I was wrong.. pointed me to a post by Kahneman in the comments section of the blog post I referred to in which Kahneman clearly takes responsibility for the unreliable interpretations he included in his book, and explaining in some detail why they were made. In other words, he’s being entirely consistent with the handy guide for being wrong that I included in my original post.


But while we’re here, let me just explain in slightly more detail what the issue was with Kahneman’s analysis…

As I’ve mentioned in other settings, if we get a result based on a very small sample size, then that result has to be considered not very reliable. But if you get similar results from several different studies, all based on small sample sizes, then the combined strength of evidence is increased. There are formal ways of combining results in this way, and it often goes under the name of ‘meta-analysis‘. This is a very important technique, especially as time and money constraints often mean the sample sizes in individual studies are small, and Kahneman used this approach – at least informally – to combine the strength of evidence from several small-sample studies. But there’s a potential problem. Not all studies into a phenomenon get published. Moreover, there’s a tendency for those having ‘interesting results’ to be more likely to be published than others. But a valid combination of information should include results from all studies, not just those with results in a particular direction.

Let’s consider a simple made-up example. Suppose I’m concerned that coins are being produced that have a propensity to come up Heads when tossed. I set up studies all around the country where people are asked to toss a coin 10 times and report whether they got 8 or more heads in their experiments. In quite a few of the studies the results turn out to be positive – 8 or more heads – and I encourage the researchers in those studies to publish the results. Now, 8 or more heads in any one study is not especially unusual: 10 is a very small sample size. So nobody gets very excited about any one of these results. But then, perhaps because they are researching for a book, someone notices that there are many independent studies all suggesting the same thing. They know that individually the results don’t say much, but in aggregate form the results are overwhelming that coins are being produced with a tendency towards Heads. And they conclude that there is very strong evidence that coins are being produced with a tendency to come up Heads. But this was a false conclusion, due to the fact that the overwhelming number of studies where 8 or more Heads weren’t obtained didn’t get published.

And that’s exactly what happened to Kahneman. The uninteresting results don’t get published, while the interesting ones do, even if they are not statistically reliable due to small sample sizes. Then someone combines via meta-analysis the published results, and gets a totally biased picture.

That’s how easy it is to be wrong.

How to be wrong

When I’m not feeling too fragile to be able to handle it, I sometimes listen to James O’Brien on LBC. As you probably know, he hosts a talk show in which he invites listeners to discuss their views on a wide range of topics, that often begin and end with Brexit. His usual approach is simply to ask people who call in to defend or support their views with hard facts – as opposed to opinion or hearsay – and inevitably they can’t. James himself is well-armed with facts and knowledge, and is consequently able to forensically dissect arguments that are dressed up as factual, but turn out to be anything but. It’s simultaneously inspiring and incredibly depressing.

He’s also just published a book, which is a great read:


This is the description on Amazon:

Every day, James O’Brien listens to people blaming benefits scroungers, the EU, Muslims, feminists and immigrants. But what makes James’s daily LBC show such essential listening – and has made James a standout social media star – is the careful way he punctures their assumptions and dismantles their arguments live on air, every single morning.

In the bestselling How To Be Right, James provides a hilarious and invigorating guide to talking to people with faulty opinions. With chapters on every lightning-rod issue, James shows how people have been fooled into thinking the way they do, and in each case outlines the key questions to ask to reveal fallacies, inconsistencies and double standards.

If you ever get cornered by ardent Brexiteers, Daily Mail disciples or little England patriots, this book is your conversation survival guide.

And this is the Sun review on the cover:

James O’Brien is the epitome of a smug, sanctimonious, condescending, obsessively politically-correct, champagne-socialist public schoolboy Remoaner.

Obviously, both these opinions should give you the encouragement you need to read the book. Admittedly, it’s only tenuously related to Statistics, but the emphasis on the importance of fact and evidence is a common theme.

But I don’t want to talk about being right. I want to talk about being wrong.

One of my first tasks when I joined Smartodds around 14 years ago was to develop an alternative model to the standard goals model for football. I made a fairly simple suggestion, and we coded it up to run live in parallel to the goals model. We kept it going for a year or so, but rather than being an improvement on the goals model, it tended to give poorer results. This was disappointing, so I looked into things and came up with a ‘proof’ of how, in idealised circumstances, it was impossible for the new model to improve on the goals model. Admittedly, our goals model didn’t quite have the idealised form, so it wasn’t a complete surprise that the numbers were a bit different. But the argument seemed to suggest anyway that we shouldn’t really expect any improvement, and since we weren’t getting very good results anyway, we were happy to bury the new model on the strength of this slightly idealised theoretical argument.

Fast-forward 14 years… Some bright sparks in the RnD team have been experimenting with models that have similar structure to the one which I’d proved couldn’t really work and which we’d previously abandoned. And they’ve been getting quite good results, that seem to be an improvement on the performance of the original goals model. At first I thought it might just be that the new models were so different to the one I’d previously suggested, that my arguments about the model not being able to improve on the goals model might not be valid. But when I looked at things more closely, I realised that there was a flaw in my original argument. It wasn’t wrong exactly, but it didn’t apply to the versions of the model we were likely to use in practice.

Of course, this is good and bad news. It’s good news that there’s no reason why the new versions of the model shouldn’t improve on the goals model. It’s bad news that if we’d understood that 14 years ago, we might have explored this avenue of research sooner. I should emphasise, it might be that this type of model still ends up not improving on our original goals model; it’s just that whereas I thought there was a theoretical argument which suggested that was unlikely, this argument actually doesn’t hold true.

So what’s the point of this post?

Well, all of us are wrong sometimes. And in the world of Statistics, we’re probably wrong more often than most people, and sometimes for good reasons. It might be:

  • We were unlucky in the data we used. They suggested something, but it turned out to be just due to chance.
  • Something changed. We correctly spotted something in some data, but subsequent to that things changed, and what we’d previously spotted no longer applies.
  • The data themselves were incomplete or unreliable.

Or it might be for not-such-good reasons:

  • We made a mistake in the modelling.
  • We made a mistake in the programming.

Or, just maybe, someone was careless when applying a simple mathematical identity in a situation for which it wasn’t really appropriate. Anyway, mistakes are inevitable, so here’s a handy guide about how to be wrong:

  1. Try very hard not to be wrong.
  2. Realise that, despite trying very hard, you might be wrong in any situation, so be constantly aware as new evidence becomes available that you may need to modify what you believed to be true.
  3. Once you realise you are wrong, let others know what was wrong and why you made the mistake you did. Humility and honesty is way more useful than evasiveness.
  4. Be aware that other people may be wrong too. Always use other people’s work with an element of caution, and if something seems wrong, politely discuss the possibility with them. (But remember also: you may be wrong about them being wrong).

Hmmm, hope that’s right.

I was encouraged to write a post along these lines by following a recent  chat where we were discussing the mistake I’d made as explained above. To help me not feel quite so bad about it, he mentioned a recent blog post where some of the research described in Daniel Kahneman’s book, ‘Thinking, Fast and Slow’, is also shown to be unreliable. You might remember I discussed this book briefly in a previous post. Anyway, the essence of that post is that the sample sizes used in much of the reported research are too small for the statistical conclusions reached to be valid. As such, some chapters from Kahneman’s book have to be considered unreliable. Actually, Kahneman himself seems to have been aware of the problem some years ago, writing an open letter to relevant researchers, setting out a possible protocol that would avoid the sorts of problems that occurred in the research on which his book chapters were based. However, while Kahneman himself can’t be blamed for the original failures in the research that he reported on, it’s argued in the blog post that his own earlier research might well have led him to foresee these types of problems. Hence, the rather aggressive tone of his letter seems to me like an attempt at ring-fencing himself from any particular blame for the errors in his book. In other words, this episode seems like a slightly different approach to ‘how to be wrong’ compared with my handy guide above.