Magic

Here’s a statistical card trick. As I try to explain in the video, admittedly not very clearly, the rules of the trick are as follows:

  1. Matteo picks a card at random from the pack. This card is unknown to me.
  2. I shuffle the cards and turn them over one at a time.
  3. As I turn the cards over, Matteo counts them in his head until he reaches that number in the sequence. As you’ll see, his card was a 5, so he counts the cards until he reaches the 5th one.
  4. He then repeats that process, starting with the value of the 5th card, which happened to be a 10. So, he counts – again silently – a further 10 cards. He remembers the value of that card, and counts again that many cards.
  5. And so on until we run out of cards.
  6. (Picture cards count as 10.)
  7. Matteo has to remember the last card in his sequence before all of the cards ran out.
  8. And I – the magician – have to predict what that card was.

Now take a look at the video….

How did I do it? And what’s it got to do with Statistics? I’ll explain in a future post, but as usual if you’d like to write to me with your ideas I’ll be very happy to hear from you.

Massively increase your bonus

In one of the earliest posts to the blog last year I set a puzzle where I suggested Smartodds were offering employees the chance of increasing their bonus, and you had to decide whether it was in their interests to accept the offer or not.

<They weren’t, and they still aren’t, but let’s play along>.

Same thing this year, but the rules are different. Eligible employees are invited to gamble their bonus at odds of 10-1 based on the outcome of a game. It works like this…

For argument’s sake, let’s suppose there are 100 employees that are entitled to a bonus. They are told they each have the opportunity to increase their bonus by a factor of 10 by playing the following game:

  • Each of the employees is randomly assigned a number between 1 and 100.
  • Inside a room there are 100 boxes, also labelled 1 to 100.
  • 100 cards, numbered individually from 1 to 100, have been randomly placed inside the boxes, so each numbered box contains a card with a unique random number from 1 to 100. For example, box number 1 might contain the card with number 62; box number 2 might contain the card with number 25; and so on.
  • Each employee must enter the room, one a a time, and can choose any 50 of the boxes to open. If they find the card with their own number in one of those boxes, they win. Otherwise they lose.
  • Though the employees may discuss the game and decide how they will play before they enter the room, they must not convey any information to the other employees after taking their turn.
  • The employees cannot rearrange any of the boxes or the cards – so everyone finds the room in the same state when they enter.
  • The employees will have their bonus multiplied by 10 if all 100 of them are winners. If there is a single loser, they all end up with zero bonus.

Should the employees accept this game, or should they refuse it and keep their original bonuses? And if they accept to play, should they adopt any particular strategy for playing the game?

Give it some thought and then scroll down for some discussion.

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

A good place to start is to calculate the probability that any one employee is a winner. This happens if one of the 50 boxes they open, out of the 100 available, contains the card with their number. Each box is equally likely to contain their number, so you can easily write down the probability that they win. Scroll down again for the answer to this part:

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

There are 100 boxes, and the employee selects 50 of them. Each box is equally likely to contain their number, so the probability they find their number in one of the boxes is 50/100 or 1/2.

So that’s the probability that any one employee wins. We now need to calculate the probability that they all win – bearing in mind the rules of the game – and then decide whether the bet is worth taking.

In summary:

  • There are 100 employees;
  • The probability that any one employee wins their game is 1/2;
  • If they all win, their bonuses will all be multiplied by 10;
  • If any one of them loses, they all get zero bonus.

Should the employees choose to play or to keep their original bonus? And if they play, is there any particular strategy they should adopt?

If you’d like to send me your answers I’d be really happy to hear from you. If you prefer just to send me a yes/no answer, perhaps just based on your own intuition, I’d be equally happy to get your response, and you can use this form to send the answer in that case.


This is a variant on a puzzle pointed out to me by Fabian.Thut@smartodds.co.uk. I think it’s a little more tricky than previous puzzles I’ve posted, but it illustrates a specific important statistical issue that I’ll discuss when giving the solution.

Cause and effect

statistics+maybe

If you don’t see why this cartoon is funny, hopefully you will by the end of this post.

The following graph shows the volume of crude oil imports from Norway to the US and the number of drivers killed in collisions with trains, each per year:

There is clearly a very strong similarity between the two graphs. To determine the strength of similarity the standard way of measuring statistical association between two series is with the correlation coefficient. If the two series were completely unrelated the correlation coefficient would be zero. If they were perfectly in synch it would be 1. For the two series in the graph the correlation coefficient is 0.95, which is pretty close to 1.  So, you’d conclude that crude oil imports and deaths due to train collisions are strongly associated with one another – as one goes up, so does the other, and vice versa.

But this is crazy. How can oil imports and train deaths possibly be related?

This is just one of a number of examples of spurious correlations kindly sent to me by Olga.Turetskaya@smartodds.co.uk. Other examples there include:

  1. The number of deaths by drowning and the number of films Nicolas Cage has appeared in;
  2. Cheese consumption and the number of deaths by entanglement in bedsheets;
  3. Divorce rates and consumption of margarine.

In each case, like in the example above, the correlation coefficient is very close to 1. But equally in each case, it’s absurd to think that there could be any genuine connection between the processes, regardless of what statistics might say. So, what’s going on? Is Statistics wrong?

No, Statistics is never wrong, but the way it’s used and interpreted often is.

There are two possible explanations for spurious correlations like the one observed above:

  1. The processes might be genuinely correlated, but not due to any causal relationship. Instead, both might be affected in the same way by some other variable lurking in the background. Wikipedia gives the example of ice cream sales being correlated with deaths by drowning. Neither causes the other, but they tend to be simultaneously both large or low – and therefore correlated – because each increases in periods of very hot weather.
  2. The processes might not be correlated at all, but just by chance due to random variation in the data, they look correlated. This is unlikely to happen with just a single pair of processes, but if we scan through enough possible pairs, some are bound to.

Most likely, for series like the one shown in the graph above, there’s a bit of both of these effects in play. Crude oil imports and deaths by train collisions have probably both diminished in time for completely unrelated reasons. This is the first of those effects, where time is the lurking variable having a similar effect on both oil imports and train collisions. But on top of that, the random looking oscillations in the curves, which occur at around the same times for each series, are probably just chance coincidences. Most series that are uncorrelated won’t share such random-looking variations, but once in so often they will, just by chance. And the processes shown in the graph above might be the one pair out of thousands that have been examined which have this unusual similarity just by chance.

So, for both these reasons, correlation between variables doesn’t establish a causal relationship. And that’s why the cartoon above is funny. But if we can’t use correlation to establish whether a relationship is causal or not, what can we use?

We’ll discuss this in a future post.


Meantime, just in case you haven’t had your fill of spurious correlations, you can either get a whole book-full of them at Amazon  or use this page to explore many other possible examples.

Intuition

Intuition is a great asset, but:

  1. Can sometimes lead you astray;
  2. Always needs to be balanced by doubt, to avoid becoming complacency;
  3. Is never a substitute for genuine understanding and knowledge.

A while back I posed the question

You’re at a party and meet someone. After chatting for a bit, you work out that the girl you’re talking to has 2 kids and one of them is a boy. What are the chances she’s got 2 boys rather than a boy and a girl?

And I discussed the solution to this problem in a subsequent post. As I discussed, although it’s tempting to give an answer of 1/2 – arguing that the other child is equally likely to be a boy or a girl – this reasoning is wrong, and the correct answer is 1/3.

I then followed up this discussion by extending the original problem as follows. Everything is the same, except you are told that the woman has 2 children, one of whom is a boy that was born on a Tuesday. With this information, what is the probability that the other child is also a boy?

Again, it looks like there ought to be an easy answer to this problem. And again, it turns out that this easy answer is wrong. But this time the breakdown of the simple intuitive argument is often surprising even to people who work regularly with probabilities.

The intuitive – but wrong – answer is that the probability is still 1/3. The argument goes that there is no relationship between gender and day of birth. Technically,  gender and day of birth are statistically independent. So, the information provided by the day of birth is completely irrelevant and can be ignored, meaning the probability that the other child is a boy remains at 1/3.

Except it doesn’t. The true answer is 13/27, which is just slightly less than 1/2.

The calculation leading to this answer is not difficult, but slightly more complicated than is reasonable for me to include in this post. If you’re interested, you can find a full explanation here.

But, as explained in that article, it’s interesting to state the result in slightly more generality.  We added the condition that the boy was born on a Tuesday, an event which has probability 1/7 for both boys or girls. Suppose we replace this event with something else like ‘The boy is left-handed’; or ‘The boy has green eyes’; or ‘The boy was born on Christmas Day’. And suppose the probability of a child – boy or girl – fulfilling this extra condition is p.

For the original Tuesday’s child problem, p=1/7. For the Christmas birthday p=1/365. For green eyes it’s whatever the proportion of people in the population with green eyes is. And so on.

Anyway, if we replace the condition of being born on a Tuesday with some other condition whose probability is p for either boys or girls, it turns out that the probability that both the children are boys, given the information provided, is

Q = \frac{2-p}{4-p}

If we substitute p=1/7 in this expression we get Q = 13/27, which explains the answer given above. But it’s not the value itself that’s important; it’s the fact that it’s different from 1/3. So, although gender and day of birth are independent, giving the extra information about day of birth shifts the original probability from 1/3 to 13/27.

And there’s something even more interesting. If the day of birth extra condition is replaced with a condition that’s less likely, so that p is smaller, then the value of Q gets closer and closer to 1/2. For example,  with the Christmas birthday condition, p=1/365 leading to Q = 0.4996573. In other words, if we include a very unusual condition for the child known to be a boy, the probability that the other child is also a boy gets very close to  1/2, which is the answer you get to the original problem by using the wrong intuition.

Not very intuitive, but it’s the truth.


Finally, here’s Dilbert’s take on intuition:

Proof reading

In an earlier post I described what’s generally known as the Mutilated Chessboard Puzzle. It goes like this: a chessboard has 2 diagonally opposite corners removed. The challenge is to cover the remaining 62 squares with 31 dominoes, each of which can cover 2 adjacent horizontal or vertical squares. Or, to show that such a coverage is impossible.

Several of you wrote to me about this, in many cases providing the correct solution. Thanks and congratulations to all of you.

The correct solution is that it is impossible to cover the remaining squares of the chessboard this way. But what’s interesting about this puzzle – to me at least – is what it illustrates about mathematical proof.

There would essentially be 2 ways to prove the impossibility of  a domino coverage. One way would be to enumerate every possible configuration of the 31 dominoes, and to show that none of these configurations covers the 62 remaining squares on the chessboard. But this takes a lot of time – there are many different ways of laying the dominoes on the chessboard.

The alternative approach is to ‘step back’ and try to reason logically why such a configuration is impossible. This approach won’t always work, but it’s often short and elegant when it does. And with the mutilated chessboard puzzle, it works beautifully…

When you place a domino on a chessboard it will cover 1 black and 1 red square (using the colours in the diagram above). So, 31 dominoes will cover 31 black and 31 red squares. But if you remove diagonally opposite corners from a chessboard, they will be of the same colour, so you’re left with either 32 black squares and 30 red, or vice versa. But you’re never left with 31 squares of each colour, which is the only pattern possible with 31 dominoes. So it’s impossible and the result is proved. Simply and beautifully.

As I mentioned in the previous post the scientific writer Cathy O’Neil cites having been shown this puzzle by her father at a young age as the trigger for her lifelong passion for mathematics. And maybe, even if you don’t have a passion for mathematics yourself, you can at least see why the elegance of this proof might trigger someone’s love for mathematics in the way it did for Cathy.

Having said all that, computer technology now makes proof by enumeration possible in situations where the number of configurations to check might be very large. But structured mathematical thinking is still often necessary to determine the parameters of the search. A good example of this is the well-known four colour theorem. This states that if you take any region that’s been divided into sub-regions – like a map divided into countries – then you only need four colours to shade the map in such a way that no adjacent regions have the same colour.

Here’a an example from the Wiki post:

You can see that, despite the complexity of the sub-regions, only 4 colours were needed to achieve a colouring in which no two adjacent regions have the same colour.

But how would you prove that any map of this type would require at most 4 colours? Ideally, as with the mutilated chessboard puzzle, you’d like a ‘stand back’ proof, based on pure logic. But so far no one has been able to find one. Equally, enumeration of all possible maps is clearly impossible – any region can be divided into subregions in infinitely many ways.

Yet a proof has been found which is a kind of hybrid of the ‘stand back’ and ‘enumeration’ approaches. First, a deep understanding of mathematical graphs was used to reduce the infinitely many possible regions to a finite number – actually, around 2000 – of maps to consider. That’s to say, it was shown that it’s not necessary to consider all possible regional mappings – if a 4-colour shading of a certain set of  2000ish different maps could be found, this would be enough to prove that such a shading existed for all possible maps. Then a computer algorithm was developed to search for a 4-colour shading for each of the identified 2000 or so maps.  Putting all of this together completed the proof that a 4-colour shading existed for any map, not just the ones included in the search.

Now, none of this is strictly Statistics, though Cathy O’Neil’s book that I referred to in the previous post is in the field of data science, which is at least a close neighbour of Statistics. But in any case, Statistics is built on a solid mathematical framework, and things that we’ve seen in previous posts like the Central Limit Theorem – the phenomenon by which the frequency distributions of many naturally occurring phenomena end up looking bell-shaped – are often based on the proof of a formal mathematical expression, which in some cases is as simple and elegant as that of the mutilated chessboard puzzle.


I’ll stop this thread here so as to avoid a puzzle overload, but I did want to mention that there is an extension of the Mutilated Chessboard Puzzle. Rather than removing 2 diagonally opposite corners, suppose I remove any 2 arbitrary squares, possibly adjacent, possibly not. In that case, can the remaining squares be covered by 31 dominoes?

If the 2 squares removed are of the same colour, the solution given above works equally well, so we know the problem can’t be solved in that case. But what if I remove one black and one red square? In that case, can the remaining squares be covered by the 31 dominoes:

  1. Always;
  2. Sometimes; or
  3. Never?

I already sent this problem to some of you who’d sent me a solution to the original problem. And I should give a special mention to Fabian.Thut@smartodds.co.uk  who provided a solution which is completely different to the standard textbook solution. Which illustrates another great thing about mathematics: there is often more than solution to the same problem. If you’d like to try this extension to the original problem, or discuss it with me, please drop me a line.

 

 

Statistics of the decade

Now that the nights are drawing in, our minds naturally turn to regular end-of-year events and activities: Halloween; Bonfire night; Christmas; New Year’s eve; and the Royal Statistical Society ‘Statistics of the Year’ competition.

You may remember from a previous post that there are 2 categories for Statistic of the Year: ‘UK’ and ‘International’. You may also remember that last year’s winners were 27.8% and 90.5% respectively. (Don’t ask, just look back at the previous post).

So, it’s that time again, and you are free to nominate your own statistics for the 2019 edition. Full details on the criteria for nominations are given at the RSS link above, but suggested categories include:

  • A statistic that debunks a popular myth;
  • A statistic relevant to a key news story or social trend;
  • A statistic relevant to a phenomenon/craze this year.

But if that’s not exciting enough, this year also sees the end of the decade, so you are also invited to nominate for ‘Statistic of the Decade’, again in UK and International categories. As the RSS say:

The Royal Statistical Society is not only looking for statistics that captured the zeitgeist of 2019, but as the decade draws to a close, we are also seeking statistics that can help define the 2010s.

So, what do you think? What statistics captured 2019’s zeitgeist for you? And which statistics helped define your 2010’s?

Please feel free to nominate to the RSS yourselves, but if you send me your nomination directly, I’ll post a collection of the replies I receive.


Thanks to Luigi.Colombo@Smartodds.co.uk for pointing out to me that the nominations for this year were now open.

Love it or hate it

A while ago I wrote a post about the practice of advertistics – the use, and more often misuse, of Statistics by advertising companies to promote their products. And I referenced an article in the Guardian which included a number of examples of advertistics. One of these examples was Marmite.

You probably know the line: Marmite – you either love it or hate it. That’s an advertisitic in itself. And almost certainly provably incorrect – I just have to find one person who’s indifferent to Marmite.

But I want to discuss a slightly different issue. This ‘love or hate Marmite’ theme has turned up as an advertistic for a completely different product…

DNAfit is one of a number of do-it-yourself DNA testing kits. Here’s what they say about themselves:

DNAfit helps you become the best possible version of yourself. We promise a smarter, easier and more effective solution to health and fitness, entirely unique to your DNA profile. Whatever your goal, DNAfit will ensure you live a longer, happier and healthier life.

And here’s the eminent statistician, er, Rio Ferdinand, to persuade you with statistical facts as to why you should sign up with DNAfit.

But where’s the Marmite?

Well, as part of a campaign that was purportedly setup to address a decline in Marmite sales, but was coincidentally promoted as an advertistic for the DNAfit testing kit, a scientific project was set up to find genetic markers that identify whether a person will be a lover or hater of Marmite. (Let’s ignore, for the moment, the fact that the easiest way to discover if a person is a ‘lover’ or ‘hater’ of Marmite is simply to ask them.)

Here’s a summary of what they did:

  • They recruited a sample of 261 individuals;
  • For each individual, they took a DNA sample;
  • They also questioned the individuals to determine whether they love or hate Marmite;
  • They then applied standard statistical techniques to identify a small number of genetic markers that separate the Marmite lovers from the haters. Essentially, they looked for a combination of DNA markers which were present in the ‘haters’, but absent in the ‘lovers’ (or vice versa).

Finally, the study was given a sheen of respectability through the publication of a white paper with various genetic scientists as authors.

But, here’s the typical reaction of another scientist on receiving a press release about the study:

Wow, sorry about the language there. So, what’s wrong?

The Marmite gene study is actually pretty poor science. One reason, as explained in this New Scientist article, is that there’s no control for environmental factors. For example, several members of a family might all love Marmite because the parents do and introduced their kids to it at a very early age. The close family connection will also mean that these individuals have similar DNA. So, you’ll find a set of genetic characteristics that each of these family members have, and they all also love Marmite. Conclusion – these are genetic markers for loving Marmite. Wrong: these are genetic markers for this particular family who, because they share meals together, all love Marmite.

I’d guess there are other factors too. A sample of 261 seems rather small to me. There are many possible genetic markers, and many, many more combinations of genetic markers. With so many options it’s almost certain that purely by chance in 261 individuals you can find one set of markers shared only by the ‘lovers’ and another set shared only by the ‘haters’. We’ve seen this stuff before: look at enough things and something unlikely is bound to occur just by chance. It’s just unlikely to happen again outside of the sample of individuals that took part in the study.

Moreover, there seems to have been no attempt at validating the results on an independent set of individuals.

Unfortunately for DNAfit and Marmite, they took the campaign one stage further and encouraged Marmite customers – and non-customers – to carry out their own DNA test to see if they were Marmite ‘lovers’ or ‘haters’ using the classification found in the genetic study. If only they’d thought to do this as part of the study itself. Because although the test claimed to be 99.98% accurate, rather many people who paid to be tested found they’d been wrongly classified.

One ‘lover’ who was classified as a ‘hater’ wrote:

I was genuinely upset when I got my results back. Mostly because, hello, I am a ‘lover’, but also because I feel like Marmite led me on with a cheap publicity tool and I fell for it. I feel dirty and used.

While a wrongly-classified ‘hater’ said:

I am somewhat offended! I haven’t touched Marmite since I was about eight because even just the thought of it makes me want to curl up into a ball and scrub my tounge.

Ouch! ‘Dirty and used’. ‘Scrub my tongue’. Not great publicity for either Marmite or DNAfit, and both companies seem to have dropped the campaign pretty quickly and deleted as many references to it as they were able.

Ah, the price of doing Statistics badly.


p.s. There was a warning in the ads about a misclassification rate higher than 0.02% but they just dismissed it as fake news…

 

 

Family problems

In an earlier post, I set the following problem:

You’re at a party and meet someone. After chatting for a bit, you work out that the girl you’re talking to has 2 kids and one of them is a boy. What are the chances she’s got 2 boys rather than a boy and a girl?

Following some feedback, I later updated that post to clarify that the question isn’t intended to be about possible slight differences in the birth rates of boys and girls. That’s an interesting biological and demographic issue, but wasn’t intended as the point of the question. For the purposes of the question I simply meant to assume that all children, regardless of a mother’s previous history of births, is equally likely to be a boy or a girl.

In that case, it’s very tempting to answer the question above as 1/2. Indeed, this was the point of the post. One child’s a boy. The other is just as likely to be a boy or a girl, so the answer must be 1/2.

Except it’s not.

The answer is 1/3, and here’s the reasoning…

Without any additional information, if a woman has a 2-child family the possibilities (with the oldest child listed first) are:

Boy- Boy, Boy-Girl, Girl-Boy, Girl-Girl

and because all children are equally likely to be male or female, these combinations are all equally likely. But we can rule out the Girl-Girl combination from the information in the problem, so the remaining possibilities are

Boy-Boy, Boy-Girl, Girl-Boy

with each being equally likely. But if you consider a boy in one of these pairs, it’s only in the first case that the other child is a boy. So, in just 1 of the 3 equally likely outcomes is the other child also a boy, so the probability is 1/3.

This simple problem illustrates the difficulty in calculating what is called a conditional probability – the probability of something conditional on some given information, in this case that one of the children is a boy. Whereas in general the chance of a child being a boy is 1/2, once you include the extra information that we’re looking at a 2-child family in which at least one of the children is a boy, the probability changes. At first it seems counter-intuitive, but once you’ve seen a few problems of this type, your intuition becomes sharper.

With that in mind, let me pose the same problem as above, but suppose you find out that one of the woman’s kids is a boy that was born on a Tuesday. Now what’s the probability that the other child is also a boy?

As usual, I’ll write a future post with the solution and discussion. But if you’d like to send me your own ideas, either by mail or via this survey form, I’d be really happy to hear from you.


Thanks to those of you who replied to the original question. Apart from some initial confusion about whether I was suggesting boys might be more or less common than girls in general, there was roughly a 50-50 split in answers between 1/2 and 1/3. As explained above, it’s easy to be misled into thinking the answer is 1/2, so there is no embarrassment in arriving at that answer. And like I say, that was the whole point of the post. Nonetheless, well done to those of you who got the correct answer of 1/3.

It’ll be interesting to see what replies I get to the revised problem above, so please do send me your answer or thoughts on the problem if you have time.

Advertistics

 

carlsberg

In a recent  Guardian article, Arwa Mahdawi defines something that she jokingly calls ‘advertistics’. These are statistics based on surveys that are designed to generate results that are useful for advertising a particular product. This might be achieved in different ways, including:

  • The question might be asked in a pointed way which steers respondents in a particular direction;
  • The sample of individuals surveyed might be creatively chosen so that they are more likely to answer in a particular way;
  • An incentive might be offered to respondents who give particular answers;
  • Surveys might be ignored and repeated until the desired outcome is achieved;
  • The survey and the statistics might just be made up.

But whichever method is used, the results are presented as if they are genuinely representative of the wider population. These are advertistics.

One example referred to in Arwa’s Guardian article is a survey of Americans which concluded that 45% of Americans wear the same underpants for at least 2 consecutive days and that American men are 2.5 times as likely as women to have worn their underwear unchanged for more than a week. But here’s the catch: the survey was carried out by an underwear manufacturer, and the details of their survey design are unavailable. So, it’s impossible to know whether the individuals they sampled were genuinely representative of the wider American population, and therefore whether the 45% advertistic has any basis in reality. Nonetheless, it’s convenient for the underwear company to present it as if it does in order to strengthen their campaign for people to replace their underwear more frequently. By buying more of their products, of course.

Another example: I’m old enough to remember ads produced by the cat-food manufacturer Whiskas that claimed:

8 out of 10 cats prefer Whiskas.

Except,

  1. Nobody asked the cats; and
  2. Many owners didn’t reply.

So they were forced to change the tag line to:

8 out of 10 owners who expressed a preference said their cat prefers it.

Definitely not as snappy, though scientifically more correct. Yet without further details on exactly how the survey was conducted, doubts remain about the validity of the 8 out of 10 advertistic even with the added caveats.

Finally, remember that things can change in time, and statistics – and advertisitcs – will change accordingly. Arguably the most famous advertistic of all time is the ‘fact’ that Carlsberg is…

Probably the best beer in the world

Except, shockingly, it no longer is. The latest Carlsberg campaign includes the admission that Carlsberg is

Probably not the best beer in the world.

Which to believe? Well, the new campaign comes with evidence supplied by Carlsberg drinkers including the claims that

Carlsberg tastes like stale breadsticks

and that drinking Carlsberg is like…

… drinking the bathwater your nan died in

So, on the strength of evidence, we’re going to have to accept that Carlsberg’s not the best.

Probably.

 

Fringe benefits

The Edinburgh Fringe Festival is the largest arts festival in the world. The 2019 version has just finished, but Wikipedia lists some of the statistics for the 2018 edition:

  1. the festival lasted 25 days;
  2. it included more than 55,0000 performances;
  3. that comprised 3548 different shows.

The shows themselves are of many different types, including theatre, dance, circus and music. But the largest section of the festival is comedy, and performers compete for the Edinburgh Comedy Awards – formerly known as the Perrier Award – which is given to the best comedy show on the fringe.

I mention all this because the TV Channel Dave also publishes what it regards to be the best 10 jokes of the festival. And number 4 this year was a statistical joke.

Enjoy:

A cowboy asked me if I could help him round up 18 cows. I said, “Yes, of course. That’s 20 cows.”


Confession: the joke is really based on arithmetic rather than Statistics.