The benefit of foresight

Ok, I’m going to be honest… I’m not really happy with this post. I keep deleting it and re-writing it, but can’t get it in a form where it eloquently says what I want it to say. (Insert your own <like all of your other posts> joke here).

I’m trying to say the following things:

  1. Trading in sports – or any field – is about predicting what will happen in the future;
  2. Data are a summary of the past. If the future behaves like the past, then the data are likely to be useful; if it doesn’t, they’re likely to be less useful;
  3. There is often information about the way things are likely to change in the future that’s external to, and not included in, data;
  4. This means that predictions for sports trading based on statistical procedures will always be improved by the inclusion of additional knowledge and information that is provided by experts.

That’s what the rest of this post is trying to say. Unfortunately, it’s an admission of a poor post that I’m having to tell you this in advance, rather than letting you draw these conclusions yourself.


It’s often said that ‘with the benefit of hindsight, things could have been done better’. But since hindsight isn’t available when trading on sports, the best we can do is make optimal use of foresight.

This season has been a record-breaker for the NFL. Among other tumbling records, at 1371, the number of touchdowns in the regular season is the largest in the league’s 99-year history.

Of course, random variation means records will be broken from time to time just by chance, but if this sudden increase in points was actually predictable, then bets placed on NFL would have been improved if they had taken this into account.

Naturally, as statisticians, our primary source of evidence is contained in data, and we aim to exploit basic patterns and trends in data to help make predictions for the future. But data are by definition a snapshot of the past, and the models we develop will only work well if the future behaves like the past. Admittedly, if changes have already occurred, these will be encapsulated in data, and can be extrapolated into predictive models for the future. But data do not, in themselves, describe mechanisms of change.  And it will always be essential to use additional sources of information and knowledge, not contained in data, to temper, inform and modify predictions from data-based statistical models.

With all that in mind, I found this article an interesting read. It provides a chronology of events connected to the NFL, all of which have contributed one way or another to the current attack-based tendency of play. The foresight to use this knowledge at the start of the season, to modify predictions to account for a likely increase in points due to a greater emphasis on attack, would almost certainly have led to better predictions than those provided by using data-based models only.




Statistics on match day

In an earlier post I discussed how the use of detailed in-play statistics was becoming much more important for sports modelling, and we looked at a video made at OPTA where they discuss how the data from a single event in a match is converted into a database entry. In that video there was reference to another video showing OPTA ‘behind-the-scenes’ on a typical match day. You can now see that video below.

Again, this video is a little old now, and chances are that OPTA now use fully genuine copies of Windows (see video at 2.07), but I thought again it might be of interest to see the process by which some of our data are collected. In future posts we might discuss the nature of some of the data that they are collecting.

Anatomy of a goal

One way of trying to improve sports models is to adapt them to include extra information. In football, for example, rather than just using goals from past fixtures, you might try to include more detailed information about how those fixtures played out.

It’s a little old now – 2013 – but I recently came across the video below. As you probably know, OPTA is the leading provider of in-match sports data, giving detailed quantitative measures of every event and every player in a match, not just in football, but for many other sports as well.

In this video, Sam from OPTA is discussing the data derived from a single event in a football match: Iniesta’s winner in the 2010 world cup final. I think it’s interesting because we tend to treat the data as our raw ingredients, but there is a process by which the action in a game is converted into data, and this video gives insights into that actual process.

In future posts we might look at how some of the data collected this way is used in models.

Incidentally, this video was produced by numberphile, a group of nerds maths enthusiasts who make fun (well, you know, “fun”)  YouTube videos on all aspects of maths and numbers, including, occasionally, statistics. Chances are I’ll be digging through their archives to see if there’s anything else I can steal borrow for the blog.

Question: if you watch the video carefully, you will see at some point (2:12, precisely) that event type number 31 is “Picked an orange”. What is that about? Is “picked an orange” a colloquialism for something? Forgive my ignorance, but I have simply no idea, and would be really happy if someone could explain.

Update: Here are 2 plausible explanations from

  1. The keeper catches a cross
  2. Yellow card given, but could’ve been red

If anyone knows the answer or has alternative suggestions I’ll include them here, thanks.

Actually, could it be this? When a match is played in snowy conditions, an orange ball is used to make it more visible. Maybe “picking an orange” refers to the decision to switch to such a ball by the referee.