Walking on water

Here’s a question: how do you get dogs to walk on water?

Turns out there’s a really simple answer – just heat the atmosphere up by burning fossil fuels so much that the Greenland ice sheets melt.

The remarkable picture above was taken by a member of the Centre for Ocean and Ice at the Danish Meteorological Institute. Their pre-summer retrieval of research equipment is normally a sledge ride across a frozen winter wasteland; this year it was a paddle through the ocean that’s sitting on what’s left of the ice. And the husky dogs that pull the sledge are literally walking on water.

This graph shows the extent – please note: clever play on words – of the problem…

The blue curve shows the median percentage of Greenland ice melt over the last few decades. There’s natural year-to-year variation around that average, and as with any statistical analysis, it’s important to understand what types of variations are normal before deciding whether any particular observation is unusual or not. So, in this case, the dark grey area shows the range of values were observed in 50% of years; the light grey area is what was observed in 90% of years. So, you’d only expect observations outside the light grey area once every ten years. Moreover, the further an observation falls outside of the grey area, the more anomalous it is.

Now, look at the trace for 2019 shown in red. The value for June isn’t just outside the normal range of variation, it’s way outside. And it’s not only an unusually extreme observation for June; it would be extreme for the hottest part of the year in July. At it’s worst (so far), the melt for June 2019 reached over 40%, whereas the average in mid-July is around 18%, with a value of about 35% being exceeded only once in every 10 years.

So, note how much information can be extracted from a single well-designed graph. We can see:

  1. The variation across the calendar of the average ice melt;
  2. The typical variation around the average – again across the calendar – in terms of an interval expected to contain the true value on 50% of occasions: the so-called inter-quartile range;
  3. A more extreme measure of variation, showing the levels that are exceeded only once every 10 years: the so-called inter-decile range;
  4. The trace of an individual year – up to current date – which appears anomalous.

In particular, by showing us the variation in ice melt both within years and across years we were able to conclude that this year’s June value is truly anomalous.

Now let’s look at another graph. These are average spring temperatures, not for Greenland but for Alaska, where there are similar concerns about ice melt caused by increased atmospheric temperatures.

alaska

Again, there’s a lot of information:

  1. Each dot is an average spring temperature, one per year;
  2. The dots have been coloured: most are black, but the blue and red ones correspond to the ten coldest and hottest years respectively;
  3. The green curve shows the overall trend;
  4. The value for 2019 has been individually identified.

And the picture is clear. Not only has the overall trend been increasing since around the mid-seventies, but almost all of the hottest years have occurred in that period, while almost none of the coldest have. In other words, the average spring temperature in Alaska has been increasing over the last 50 years or so, and is hotter now than it has been for at least 90 years (and probably much longer).

Now, you don’t need to be a genius in biophysics to understand the cause and effect relating temperature and ice. So the fact that extreme ice melts are occurring in the same period as extreme temperatures is hardly surprising. What’s maybe less well-known is that the impact of these changes has a knock-on effect way beyond the confines of the Arctic.

So, even if dogs walking on the water of the arctic oceans seems like a remote problem, it’s part of a chain of catastrophic effects that will soon affect our lives too. Statistics has an important role to play in determining and communicating the presence and cause of these effects, and the better we all are at understanding those statistics, the more likely we will be able to limit the damage that is already inevitable. Fortunately, our governments are well aware of this and are taking immediate actions to remedy the problem.

Oh, wait…

… scrap that, better take action ourselves.

Nul Points

No doubt you’re already well-aware of, and eagerly anticipating, this year’s Eurovision song contest final to be held in Tel Aviv between the 14th and 18th May. But just in case you don’t know, the Eurovision song contest is an annual competition to choose the ‘best’ song entered between the various participating European countries. And Australia!

Quite possibly the world would never have heard of Abba if they hadn’t won Eurovision. Nor Conchita Wurst.

The voting rules have changed over the years, but the structure has remained pretty much the same. Judges from each participating country rank their favourite 10  songs – excluding that of their own country, which they cannot vote for – and points are awarded on the basis of preference. In the current scheme, the first choice gets 12 points, the second choice 10 points, the third choice 8 points, then down to the tenth choice which gets a single point.

A country’s total score is the sum awarded by each of the other countries, and the country with the highest score wins the competition. In most years the scoring system has made it possible for a song to receive zero points – nul points – as a total, and there’s a kind of anti-roll-of-honour dedicated to countries that have accomplished this feat. Special congratulations to Austria and Norway who, despite their deep contemporary musical roots, have each scored nul points on four occasions.

Anyway, here’s the thing. Although the UK gave the world The Beatles, The Rolling Stones, Pink Floyd, Led Zeppelin, David Bowie, Joy Division and Radiohead. And Adele. It hasn’t done very well in recent years in the Eurovision Song Contest.  It’s true that by 1997 the UK had won the competition a respectable 5 times – admittedly with a bit of gratuitous sexism involving the removal of women’s clothing to distract judges from the paucity of the music. But since then, nothing. Indeed, since 2000 the UK has finished in last place on 3 occasions, and has only twice been in the top 10.

Now, there are two possible explanations for this.

  1. Our songs have been terrible. (Well, even more terrible than the others).
  2. There’s a stitch-up in the voting process, with countries penalising England for reasons that have nothing to do with the quality of the songs.

But how can we objectively distinguish between these two possibilities? The poor results for the UK will be the same in either case, so we can’t use the UK’s data alone to unravel things.

Well, one way is to hypothesise a system by which votes are cast that is independent of song quality, and to see if the data support that hypothesis. One such hypothesis is a kind of ‘bloc’ voting system, where countries tend to award higher votes for countries of a similar geographical or political background to their own.

This article carries out an informal statistical analysis of exactly this type. Though the explanations in the article are sketchy, a summary of the results is given in the following figure. Rather than pre-defining the blocs, the authors use the data on voting patterns themselves to identify 3 blocs of countries whose voting patterns are similar. They are colour-coded in the figure, which shows (in some vague, undefined sense) the tendency for countries on the left to favour countries on the right in voting. Broadly speaking there’s a northern Europe group in blue, which includes the UK, an ex-Yugoslavian bloc in green and a rest-of-Europe bloc in red. But whereas the fair-minded north Europeans tend to spread their results every across all countries, the other two blocs tend to give highest votes to other member countries within the same bloc.

But does this mean the votes are based on non-musical criteria? Well, not necessarily. It’s quite likely that cultural differences – including musical ones – are also smaller within geographically homogeneous blocs than across them. In other words, Romania and Moldavia might vote for each other at a much higher than average rate, but this could just as easily be because they have similar musical roots and tastes as because they are friends scratching each other’s backs.

Another study finding similar conclusions about geo-political bloc voting is contained in this Telegraph article, which makes similar findings, but concludes:

Comforting as it might be to blame bloc voting for the UK’s endless poor record, it’s not the only reason we don’t do well.

In other words, in a more detailed analysis which models performance after allowing for bloc-voting effects, England is still doing badly.

This whole issue has also been studied in much greater detail in the academic literature using complex statistical models, and the conclusions are similar, though the authors report language and cultural similarities as being more important than geographical factors.

The techniques used in these various different studies are actually extremely important in other areas of application. In genetic studies, for example, they are used to identify groups of markers for certain disease types. And even in sports modelling they can be relevant for identifying teams or players that have similar styles of play.

But if Eurovision floats your boat, you can carry out your own analysis of the data based on the complete database of results available here.


Update: Thanks to Susie.Bruck@smartodds.co.uk for pointing me to this. So not only did the UK finish last this year, they also had their points score reduced retrospectively. If ever you needed evidence of an anti-UK conspiracy… 😉

A bad weekend

Had a bad weekend? Maybe your team faded against relegated-months-ago Huddersfield Town, consigning your flickering hopes of a Champions League qualification spot to the wastebin. Or maybe you support Arsenal.

Anyway, Smartodds loves Statistics is here to help you put things in perspective: ‘We are in trouble‘. But not trouble in the sense of having to play Europa League qualifiers on a Thursday night. Trouble in the sense that…

Human society is under urgent threat from loss of Earth’s natural life

Yes, deep shit trouble.

This is according to a Global Assessment report by the United Nations, based on work by hundreds of scientists who compiled as many as 15,000 academic studies. Here are some of the headline statistics:

  • Nature is being destroyed at a rate of tens to hundreds of times greater than the average over the last 10 million years;
  • The biomass of wild mammals has fallen by 82%;
  • Natural ecosystems have lost around half of their area;
  • A million species are at risk of extinction;
  • Pollinator loss has put up to £440 billion of crop output at risk;

The report goes on to say:

The knock-on impacts on humankind, including freshwater shortages and climate instability, are already “ominous” and will worsen without drastic remedial action.

But if only we could work out what the cause of all this is. Oh, hang on, the report says it’s…

… all largely as a result of human actions.

For example, actions like these:

  • Land degradation has reduced the productivity of 23% of global land;
  • Wetlands have drained by 83% since 1700;
  • In the years 2000-2013 the area of intact forest fell by 7% – an area the size of France and the UK combined;
  • More than 80% of wastewater, as well as 300-400m tons of industrial waste, is pumped back into natural water reserves without treatment;
  • Plastic waste is a factor of tens greater than in 1980, affecting 86% of marine turtles, 44% of seabirds and 43% of marine animals.
  • Fertiliser run-off has created 400 dead zones – an area the size of the UK.

You probably don’t need to be a bioscientist and certainly not a statistician to realise none of this is particularly good news. However, the report goes on to list various strategies that agencies, governments and countries need to adopt in order to mitigate against the damage that has already been done and minimise the further damage that will unavoidably be done under current regimes.  But none of it’s easy, and evidence so far is not in favour of collective human will to accept the responsibilities involved.

Josef Settele of the Helmholtz Centre for Environmental Research in Germany said

People shouldn’t panic, but they should begin drastic change. Business as usual with small adjustments won’t be enough.

So, yes, cry all you like about Liverpool’s crumbling hopes for a miracle against Barcelona tonight, but keep it in perspective and maybe even contribute to the wider task of saving humanity from itself.

<End of rant. Enjoy tonight’s game.>


Correction: *Bareclona’s* crumbling hopes

Happy International Day of Happiness

Did you know March 20th is the International Day of Happiness? Did you even know there was an International Day of Happiness?

Anyway, just in case you’re interested, the UN, which founded the day and organises associated annual events, produces an annual report which is essentially a statistical analysis that determines the extent of happiness in different countries of the world. It turns out that the happiest country right now is Finland, while the least happy is South Sudan. The UK is 15th. I’ll get back to you in a year’s time to let you know if we end up moving closer to Finland or South Sudan in the happiness stakes post-Brexit.

Here’s to all the money at the end of the world

I made the point in last week’s Valentine’s Day post, that although the emphasis of this blog is about the methodology of using Statistics to understand the world through the analysis of data, it’s often the case that statistics in themselves tell their own story. In this way we learnt that a good proportion of the population of the UK buy their pets presents for Valentine’s Day.

As if that wasn’t bad enough, I now have to report to you the statistical evidence for the fact that nature itself is dying. Or as the Guardian puts it:

Plummeting insect numbers `threaten collapse of nature’

The statistical and scientific evidence now points to the fact that, at current rates of decline, all insects could be extinct by the end of the century. Admittedly, it’s probably not great science or statistics to extrapolate the current annual loss of 2.5% in that way, but nevertheless it gives you a picture of the way things are going. This projected elimination of insects would be, by some definitions, the sixth mass extinction event on earth. (Earlier versions wiped out dinosaurs and so on).

And before you go all Donald Trump, and say ‘bring it on: mosquito-free holidays’, you need to remember that life on earth is a complex ecological system in which the big things (including humans) are indirectly dependent on the little things (including insects) via complex bio-mechanisms for mutual survival. So if all the insects go, all the humans go too. And this is by the end of the century, remember.

Here’s First Dog on the Moon’s take on it:

So, yeah, let’s do our best to make money for our clients. But let’s also not forget that money only has value if we have a world to spend it in, and use Statistics and all other means at our disposal to fight for the survival of our planet and all the species that live on it.

“Random”

You probably remember the NFL quarterback Colin Kaepernick who started the protest against racism in the US by kneeling during the national anthem. In an earlier post  I discussed how his statistics suggested he was being shunned by NFL teams due to his political stance. And in a joint triumph for decency and marketing, he subsequently became the current face of Nike.

Since I now follow Kaepernick on Twitter, I recently received a tweet sent by Eric Reid of the Carolina Panthers. Reid was the first player to kneel alongside Kaepernick when playing for the San Francisco 49ers. But when his contract expired in March 2018, Reid also struggled to find a new club, despite his form suggesting he’d be an easy selection. Eventually, he joined Carolina Panthers after the start of the 2018-19 season, and opened a dispute with the NFL, claiming that, like Kaepernick, he had been shunned by most teams as a consequence of his political actions. 

This was his tweet:

The ‘7’ refers to the fact that Reid had been tested seven times since joining the Panthers in the standard NFL drug testing programme, and the “random” is intended ironically. That’s to say, Reid is implying that he’s being tested more often than is plausible if tests are being carried out randomly: in other words, he’s being victimised for the stand he’s taking against the NFL

Reid is quoted as saying:

I’ve been here 11 weeks, I’ve been drug-tested seven times. That has to be statistically impossible. I’m not a mathematician, but there’s no way that’s random.

Well, let’s get one thing out of the way first of all: the only things that are statistically impossible are the things that are actually impossible. And since it’s possible that a randomised allocation of tests could lead to seven or more tests in 11 weeks, it’s certainly not impossible, statistically or otherwise. 

However… Statistics is almost never about the possible versus the impossible; yes versus no; black versus white (if you’ll excuse the double entendre). Statistics is really about degrees of belief. Does the evidence suggest one version is more likely than another? And to what extent is that conclusion reliable?

Another small technicality… it seems that the first of Reid’s drug tests was actually a mandatory test that all players have to take when signing on for a new team. So actually, the question is whether the subsequent 6 tests in 11 weeks are unusually many if the tests are genuinely allocated randomly within the team roster.

On the face of it, this is a simple and standard statistical calculation. There are 72 players on a team roster and 10 players each week are selected for testing. So, under the assumption of random selection, the probability that any one player is tested any week is 10/72. Standard results then imply that the probability of a player being selected on exactly 6 out of 11 occasions – using the binomial distribution for those of you familiar with this stuff – is around 0.16%, while the probability of being tested 6 times or more is 0.17%. On this basis, there’s only a 17 in 10,000 chance that Reid would have been tested at least as often as he has been under a genuinely random procedure, and this would normally be considered small enough to provide evidence that the procedure is not random, and that Reid has been tested unduly often.  

 

However, we need to be a bit careful. Some time ago, in an offsite talk (mentioned here) I discussed the fact that 4 members of the quant team shared the same birthday, and showed that this was apparently an infinitesimally unlikely occurrence. But by considering the fact that it would have seemed surprising for any 4 individuals in the company to share the same birthday, and that there are many such potential combinations of 4 people, the event turned out not to be so very surprising after all.

And there’s a similar issue here… Reid is just one of 72 players on the roster. It happened to be Reid that was tested unusually often, but we’d have been equally surprised if any individual player had been tested at least 6 times in eleven weeks.  Is it surprising, though, that at least one of the 72 players gets tested this often? This is tricky to answer exactly, but can easily be done by simulation. Working this way I found the probability to be around 6.25%. Still unlikely, but not beyond the bounds of plausibility. A rule-of-thumb that’s often applied – and often inappropriately applied – is that if something has less than a 5% probability of occurring by chance, it’s safe to assume that there is something systematic and not random which led to the results; bigger than 5% and we conclude that the evidence isn’t strong enough to exclude the effect just being a random occurrence. So in this case, we couldn’t rule out the possibility that the test allocations are random.

So we have two different answers depending on how the data is interpreted. If we treat the data as specific to Eric Reid, then yes, there is strong evidence to suggest he’s been tested more often than is reasonable if testing is random. But if we consider him as just an arbitrary player in the roster, the evidence isn’t overwhelming that anyone in the roster as a whole has been overly tested,

Which should we go with? Well, each provides a different and valid interpretation of the available data. I would argue – though others might see it differently – that it’s entirely reasonable in this particular case to consider the data just with regard to Eric Reid, since there is a prima facia hypothesis specifically about him in respect of his grievance case against the NFL. In other words, we have a specific reason to be focusing on Reid, that isn’t driven by a dredge through the data. 

On this basis, I’d argue that it is perfectly reasonable to question the extent to which the allocation of drugs tests in the NFL is genuinely “random”, and to conclude that there is reasonable evidence that Eric Reid is being unfairly targeted for testing, presumably for political reasons. The number of tests he has faced isn’t ‘statistically impossible’, but sufficiently improbable to give strong weight to this hypothesis. 

 

 

 

Worst use of Statistics of the year

You might remember in a couple of earlier posts (here and here) I discussed the Royal Statistical Society’s ‘Statistic of the Year’ competition. I don’t have updates on the results of that competition for 2018 yet, but in the meantime I thought I’d do my own version, but with a twist: the worst use of Statistics in 2018.

To be honest,  I only just had the idea to do this, so I haven’t been building up a catalogue of options throughout the year. Rather, I just came across an automatic winner in my twitter feed this week.

So, before announcing the winner, let’s take a look at the following graph:

This graph is produced by the Office for National Statistics, which is the UK government’s own statistical agency, and shows the change in average weekly wages in the UK, after allowance for inflation effects, for the period 2008-2018. 

There are several salient points that one might draw from this graph:

  1. Following the financial crash in 2008, wages declined steadily over a 6-year period to 2014, where they bottomed-out at around 10% lower than pre-crash levels.
  2. The election of a Conservative/Lib Dem coalition government in 2010 didn’t have any immediate impact on the decline of wage levels. Arguably the policy of intense austerity may simply have exacerbated the problem.
  3. Things started to pick up during 2014, most likely due to the effects of Quantitative Easing and other efforts to stimulate the economy by the Bank of England in the period after the crash.
  4. Something sudden happened in 2016 which seems to have choked-off the recovery in wage levels. (If only there was a simple explanation for what that might be.)
  5. Wages are currently at the same level as they were 7 years ago in 2011, and significantly lower than they were immediately following the financial crash in 2008.

So that’s my take on things. Possibly there are different interpretations that are equally valid and plausible. I struggle, however, to accept the following interpretation, to which I am awarding the 2018 worst use of Statistics award:

 ONS data showing real wages rising at fastest rate in 10 years… is good news for working Britain

Now, believe me, I’ve looked very hard at the graph to try to find a way in which this statement provides a reasonable interpretation of it, but I simply can’t. You might argue that wages grew at the fastest rate in a decade during 2015, but only then because wages had performed so miserably in the preceding years.  But any reasonable interpretation of the graph suggests current wages have flatlined since 2016, and it’s simply misleading to suggest that wages are currently rising at the fastest rate in 10 years. 

So, my 2018 award for the worst use of Statistics goes to…

… Dominic Raab, who until his recent resignation was the Secretary of State responsible for the United Kingdom’s withdrawal from the European Union (i.e. Brexit) and is a leading contender to replace Theresa May as the next leader of the Conservative Party.

Well done Dominic. Whether due to mendacity or ignorance, you are a truly worthy winner.

It’s not based on facts

We think that this is the most extreme version and it’s not based on facts. It’s not data-driven. We’d like to see something that is more data-driven.

Wow! Who is this staunch defender of statistical methodology? This guardian of scientific method. This warrior of the value of empirical information to help identify and confirm a truth.

Ah, but wait a minute, here’s the rest of the quote…

It’s based on modelling, which is extremely hard to do when you’re talking about the climate. Again, our focus is on making sure we have the safest, cleanest air and water.

Any ideas now?

Since it requires an expert in doublespeak to connect those two quotes together, you might be thinking Donald Trump, but we’ll get to him in a minute. No, this was White House spokesperson Sarah Sanders in response to the US government’s own assessment of climate change impact. Here’s just one of the headlines in that report (under the Infrastructure heading):

Our Nation’s aging and deteriorating infrastructure is further stressed by increases in heavy precipitation events, coastal flooding, heat, wildfires, and other extreme events, as well as changes to average precipitation and temperature. Without adaptation, climate change will continue to degrade infrastructure performance over the rest of the century, with the potential for cascading impacts that threaten our economy, national security, essential services, and health and well-being.

I’m sure I don’t need to convince you of the overwhelming statistical and scientific evidence of climate change. But for argument’s sake, let me place here again a graph that I included in a previous post

This is about as data-driven as you can get. Data have been carefully sourced and appropriately combined from locations all across the globe. Confidence intervals have been added – these are the vertical black bars – which account for the fact that we’re estimating a global average on the basis of a limited number of data. But you’ll notice that the confidence bars are smaller for more recent years, since more data of greater reliability is available. So it’s not just data, it’s also careful analysis of data that takes into account that we are estimating something. And it plainly shows that, even after allowance for errors due to data limitation, and also allowance for year-to-year random variation, there has been an upward trend for at least the last 100 years,  which is even more pronounced in the last 40 years.

Now, by the way, here’s a summary of the mean annual total of CO2 that’s been released into the atmosphere over roughly the same time period.

Notice any similarities between these two graphs?

Now, as you might remember from my post on Simpson’s Paradox, correlations are not necessarily evidence of causation. It could be, just on the strength of these two graphs, that both CO2 emission and global mean temperature are being affected by some other process, which is causing them both to change in a similar way. But, here’s the thing: there is a proven scientific mechanism by which an increase in CO2 can cause an increase in atmospheric temperature. It’s basically the greenhouse effect: CO2 particles cause heat to be retained in the atmosphere, rather than reflected back into space, as would be the case if those particles weren’t there. So:

  1. The graphs show a clear correlation between C02 levels and mean temperature levels;
  2. CO2 levels in the atmosphere are rising and bound to rise further under current energy polices worldwide;
  3. There is a scientific mechanism by which increased CO2 emissions lead to an increase in mean global temperature.

Put those three things together and you have an incontrovertible case that climate change is happening, that it’s at least partly driven by human activity and that the key to limiting the damaging effects of such change is to introduce energy policies that drastically reduce C02 emissions.

All pretty straightforward, right?

Well, this is the response to his own government’s report by the President of the United States:

In summary:

I don’t believe it

And the evidence for that disbelief:

One of the problems that a lot of people like myself — we have very high levels of intelligence, but we’re not necessarily such believers.

If only the President of the United States was just a little less intelligent. And if only his White House spokesperson wasn’t such an out-and-out liar.

 

I just made up this one

I saw this the other day…

And the same day I saw this…

One of these items is a cartoon character inventing a statistic just to support an argument that he can’t justify by logic or other means.

The other one is Dilbert.