First pick

Zion Williamson

If you follow basketball you’re likely to know that the NBA draft was held this weekend, resulting in wonderkid Zion Williamson being selected for New Orleans Pelicans. The draft system is a procedure by which newly available players are distributed among the various NBA teams.

Unlike most team sports at professional level in Europe, the draft system is a partial attempt to balance out teams in terms of the quality of their players. Specifically, teams that do worse one season are given preference when choosing players for the next season. It’s a slightly archaic and complicated procedure – which is shorthand for saying I couldn’t understand all the details from Wikipedia – but the principles are simple enough.

There are 3 stages to the procedure:

  1. A draft lottery schedule, in which teams are given a probability of having first pick, second pick and so on, based on their league position in the previous season. Only teams below a certain level in the league are permitted to have the first pick,  and the probabilities allocated to each team are inversely related to their league position. In particular, the lowest placed teams have the highest probability of getting first pick.
  2. The draft lottery itself, held towards the end of May, where the order of pick selections are assigned randomly to the teams according to the probabilities assigned in the schedule.
  3. The draft selection, held in June, where teams make their picks in the order that they’ve been allocated in the lottery procedure.

In the 2019 draft lottery, the first pick probabilities were assigned as follows:

nbapick

So, the lowest-placed teams, New York, Cleveland and Phoenix, were all given a 14% chance, down to Charlotte, Miami and Sacramento who were given a 1% chance. The stars and other indicators in the table are an additional complication arising from the fact that teams can trade their place in the draw from one season to another.

In the event, following the lottery based on these probabilities, the first three picks were given to New Orleans, Memphis and New York respectively. The final stage in the process was then carried out this weekend, resulting in the anticipated selection of Zion Williamson by the New Orleans Pelicans.

There are several interesting aspects to this whole process from a statistical point of view.

The first concerns the physical aspects of the draft lottery. Here’s an extract from the NBA’s own description of the procedure:

Fourteen ping-pong balls numbered 1 through 14 will be placed in a lottery machine. There are 1,001 possible combinations when four balls are drawn out of 14, without regard to their order of selection. Before the lottery, 1,000 of those 1,001 combinations will be assigned to the 14 participating lottery teams. The lottery machine is manufactured by the Smart Play Company, a leading manufacturer of state lottery machines throughout the United States. Smart Play also weighs, measures and certifies the ping-pong balls before the drawing.

The drawing process occurs in the following manner: All 14 balls are placed in the lottery machine and they are mixed for 20 seconds, and then the first ball is removed. The remaining balls are mixed in the lottery machine for another 10 seconds, and then the second ball is drawn. There is a 10-second mix, and then the third ball is drawn. There is a 10-second mix, and then the fourth ball is drawn. The team that has been assigned that combination will receive the No. 1 pick. The same process is repeated with the same ping-pong balls and lottery machine for the second through fourth picks.

If the same team comes up more than once, the result is discarded and another four-ball combination is selected. Also, if the one unassigned combination is drawn, the result is discarded and the balls are drawn again. The length of time the balls are mixed is monitored by a timekeeper who faces away from the machine and signals the machine operator after the appropriate amount of time has elapsed.

You probably don’t need me to explain how complicated this all is, compared to the two lines of code it would take to instruct the same procedure electronically. Arguably, perhaps, seeing the lottery carried out with the physical presence of ping pong balls might stop people thinking the results had been fixed. Except it doesn’t. So, it’s all just for show. Why do things efficiently and electronically when you can add razzmatazz and generate high tv ratings? Watching a statistician generate the same ratings for a couple of minutes on a laptop maybe just wouldn’t have the same appeal.

Anyway, my real reason for including this topic in the blog is the following. In several previous posts I’ve mentioned the use of simulation as a statistical technique. Applications are varied, but in most cases simulation is used to generate many realisations from a probability model in order to get a picture of what real data are likely to look like if their random characteristics are somehow linked to that probability model. 

For example, in this post I simulated how many packs of Panini stickers would be needed to fill an album. Calculating the probabilities of the number of packs needed to complete an album is difficult, but the simulation of the process of completing an album is easy.

And in a couple of recent posts (here and here) we used simulation techniques to verify what seemed like an easy intuitive result. As it turned out, the simulated results were different from what the theory suggested, and a slightly deeper study of the problem showed that some care was needed in the way the data wee simulated. But nonetheless, the principle of using simulations to investigate the expected outcomes of a random experiment were sound. In each case simulations were used to generate data from a process whose probabilities would have been practically impossible to calculate by other means.

Which brings me to this article, sent to me by Oliver.Cobb@smartodds.co.uk. On the day of the draft lottery, the masterminds at USA Today decided to run 100 simulations of the draft lottery to see which team would get the first pick. It’s mind-numbingly pointless. As Ollie brilliantly put it:

You have to admire the way they’ve based an article on taking a known chance of something happening and using just 100 simulations to generate a less reliable figure than the one they started with.

In case you’re interested, and can’t be bothered with the article, Chicago got selected for first pick most often – 19 times – in the 100 USA Today simulations, and were therefore ‘predicted’ to win the lottery.  But if they’d run their simulations much more often, it’s 100% guaranteed that Chicago wouldn’t have won, but would have been allocated first pick close to the 12.5% of occasions corresponding to their probability in the table above. With enough simulations, the simulated game would always be won by one of New York, Cleveland or Phoenix, whose proportions would only be separated by small amounts due to random variation.

The only positive thing you can say about the USA Today article, is that at least they had the good sense not to do the simulation with 14 actual ping pong balls. As they say themselves:

So to celebrate one of the most cruel and unusual days in sports, we ran tankathon.com’s NBA draft lottery simulator 100 times to predict how tonight will play out. There’s no science behind this. We literally hit “sim lottery” 100 times and wrote down the results.

I especially like the “there’s no science behind this” bit.  Meantime, if you want to create your own approximation to a known set of probabilities, you too can hit the “sim lottery” button 100 times here.


Update: Benoit.Jottreau@Smartodds.co.uk pointed me at this article, which is relevant for two reasons. First, in terms of content. In previous versions of the lottery system, there was a stronger incentive in terms of probability assignments for teams to do badly in the league. This led to teams ‘tanking’: deliberately throwing games towards the end of a season when they knew they were unlikely to reach the playoffs, thereby improving their chances of getting a better player in the draft for the following season. The 2019 version of the lottery aims to reduce this effect, by giving teams less of an incentive to be particularly poor. For example, the lowest three teams in the league now share the highest probability of first pick in the draft, whereas previously the lowest team had a higher probability than all others. But the article Benoit sent me suggests that the changes are unlikely to have much of an impact. It concludes:

…it seems that teams that want to tank still have strong incentives to tank, even if the restructured NBA draft lottery makes it less likely for them to receive the best picks.

The other reason why this article is relevant is that it makes much more intelligent use of simulation as a technique than the USA Today article referred to above.

Midrange is dead

Kirk Goldsberry is the author of a new book on data analytics for NBA. I haven’t read the book, but some of the graphical illustrations he’s used for its publicity are great examples of the way data visualization techniques can give insights about the evolution of a sport in terms of the way it is played.

 

Press the start button in the graphic of the above tweet.. I’m not sure exactly how the graphic and the data are mapped, but essentially the coloured hexagons show regions of the basketball court which are the most frequent  locations for taking shots. The animation shows how this pattern has changed over the seasons.

As you probably know, most goals in basketball – excluding penalty shots – are awarded 2 points. But a shot that’s scored from outside a distance of 7.24m from the basket – the almost semi-circular outer-zone shown in the figure – scores 3 points. So, there are two ways to improve the number of points you are likely to score when shooting: first, you can get closer to the basket, so that the shot is easier; or second, you can shoot from outside the three-point line, so increasing the number of points obtained when you do score. That means there’s a zone in-between, where the shot is still relatively difficult because of the distance from the basket, but for which you only get 2 points when you do score. And what the animation above clearly shows is an increasing tendency over the seasons for players to avoid shooting from this zone. This is perhaps partly because of a greater understanding of the trade-off between difficulty and distance, and perhaps also because improved training techniques have led to a greater competency in 3-point shots.

Evidence to support this reasoning is the following data heatmap diagram which shows the average number of points scored from shots taken at different locations on the court. The closer to red, the higher the average score per shot.

Again the picture makes things very clear: average points scored are highest when shooting from very close to the basket, or from outside of the 3-point line. Elsewhere the average is low. It’s circumstantial evidence, but the fact that this map of points scored has patterns that are so similar to the current map of where players are shooting from, there’s a strong suggestion that players have evolved their play style in order to shoot at the basket from positions which they know are more likely to generate the most points.

In summary, creative use of both static and animated graphical data representations provide great insights about the way basketball play has evolved, and why that evolution is likely to have occurred, given the 3-point shooting rule.


Thanks to Benoit.Jottreau@smartodds.co.uk for posting something along these lines on RocketChat.

March Madness

 

It’s sometimes said that a little knowledge is a dangerous thing. Arguably, too much knowledge is equally bad. Indeed, Einstein is quoted as saying:

A little knowledge is a dangerous thing. So is a lot.

I don’t suppose Einstein had gambling in mind, but still…

March Madness pools are a popular form of betting in the United States. They are based on the playoff tournament for NCAA college basketball, held annually every March, and comprise a so-called bracket bet. Prior to the tournament start, a player predicts the winners of each game from the round-of-sixteen right through to the final. This is possible since teams are seeded, as in tennis, so match pairings for future rounds are determined automatically once the winners from previous rounds are known. In practice, it’s equivalent to picking winners from the round-of-sixteen onwards in the World Cup.

There are different scoring systems for judging success in bracket picks, often with more weight given to correct outcomes in the later rounds, but in essence the more correct outcomes a gambler predicts, the better their score. And the player with the best score within a pool of players wins the prize. 

Naturally, you’d expect players with some knowledge of the differing strength of the teams involved in the March Madness playoffs to do better than those with no knowledge at all. But is it the case that the more knowledge a player has, the more successful they’re likely to be? In other words:

To what extent is success in the March Madness pools determined by a player’s basketball knowledge?

This question was explored in a recent academic study discussed here. In summary, participants were given a 25-question basketball quiz, the results of which were used to determine their level of basketball knowledge. Next, they were asked to make their bracket picks for the March Madness. A comparison was then made between accuracy of bracket picks and level of basketball knowledge.

The results are summarised in the following graph, which shows the average relationship between pick accuracy and basketball knowledge:

As you’d expect, the players with low knowledge do relatively badly.  Then, as a player’s basketball knowledge increases, so does their pick accuracy. But only up to a point. After a certain point, as a player’s knowledge increases, their pick accuracy was found to decrease. Indeed, the players with the most basketball knowledge were found to perform slightly worse than those with the least knowledge!

Why should this be?

The most likely explanation is as follows…

Consider an average team, who have recently had a few great results. It’s possible that these great results are due to skill, but it’s also plausible that the team has just been a bit lucky. The player with expert knowledge is likely to know about these recent results, and make their picks accordingly. The player with medium knowledge  will simply know that this is an average team, and also bet accordingly. While the player with very little knowledge is likely to treat the team randomly.

Random betting due to lack of knowledge is obviously not a great strategy. However, making picks that are driven primarily by recent results can be even worse, and the evidence suggests that’s exactly what most highly  knowledgable players do. And it turns out to be better to have just a medium knowledge of the game, so that you’d have a rough idea of the relative rankings of the different teams, without being overly influenced by recent results. 

Now, obviously, someone with expert knowledge of the game, but who also knows how to exploit that knowledge for making predictions, is likely to do best of all. And that, of course, is the way sports betting companies operate, combining expert sports knowledge with statistical support to exploit and implement that knowledge. But the study here shows that, in the absence of that explicit statistical support, the player with a medium level of knowledge is likely to do better than players with too little or too much knowledge. 


In some ways this post complements the earlier post ‘The benefit of foresight’. The theme of that post was that successful gambling cannot rely solely on Statistics, but also needs the input of expert sports knowledge. This one says that expert knowledge, in isolation, is also insufficient, and needs to be used in tandem with statistical expertise for a successful trading strategy. 

In the specific context of betting on the NCAA March Madness bracket, the argument is developed fully in this book. The argument, though, is valid much more widely across all sports and betting regimes, and emphasises the importance to a sports betting company of both statistical and sport expertise.

 

 


Update (21/3): The NCAA tournament actually starts today. In case you’re interested, here’s Barack Obama’s bracket pick. Maybe see if you can do better than the ex-President of the United States…