At The Intersection

You’ll remember Venn diagrams from school. They’re essentially a mathematical tool for laying out the information in partially overlapping sets. And in statistics they are often used in the same way for showing the possible outcomes in events which might overlap.

For example, here’s a Venn diagram showing the relationship between whales and fish:

Whales and fish have some properties that are unique, but they also have some features in common. These are all shown in the appropriate parts of the diagram, with the common elements falling in the part of the sets that overlap – the so-called intersection.

With this in mind, I recently came across the following Venn poem titled ‘At the Intersection’ written by Brian Bilston:

You can probably work it out. There are three poems in total:  separate ones for ‘him’ and ‘her’ and their intersection. Life seen from two different perspectives, the result of which is contained in the intersection.



Problem solved

A while back I set a puzzle asking you to try to remove three coins from a red square region as shown in the following diagram.

The only rule of the game is that when a coin is removed it is replaced with 2 coins – 1 immediately to the right of and one immediately below the coin that is removed. If there is no space for adding these replacement coins, the coin cannot be removed.

The puzzle actually appeared in a recent edition of Alex Bellos’ Guardian mathematics puzzles, though it was created by the Argentinian mathematician Carlos Sarraute. This is his solution which is astonishing for its breathtaking ingenuity.

The solution starts by giving a value to every square in the grid as follows:

Remember, the grid goes on forever both to the right and downwards. The top left hand box has value 1. Going right from there, every subsequent square has value equal to 1/2 of the previous one. So: 1, 1/2, 1/4, 1/8 and so on. The first column is identical to the first row. To complete the second row, we start with the first value, 1/2, and again just keep multiplying by 1/2. The second column is the same as the second row. And we fill the entire grid this same way. Technically, every row and column is a series of geometric numbers: consecutive multiples of a common number, which in this case is 1/2.

Let’s define the value of a coin to be the value of the square its on. Then the total value of the coins at the start of the game is

1 + \frac{1}{2} + \frac{1}{2}= 2


  • When we remove a coin we replace it with two coins, one immediately to the left and one immediately to the right. But if you look at the value any square on the grid, it is equal to the sum of the values of the squares immediately below and to the right. So when we remove a coin we replace it with two coins whose total value is the same. It follows that the total value of the coins stays unchanged however many moves we make. It will always be 2 however many moves we make.
  • This is the only tricky mathematical part. Look at the first row of numbers. It consists of 1, 1/2, 1/4, 1/8… and goes on forever. But even though this is an infinite sequence it has a finite sum of 2. Obviously, we can never really add infinitely many numbers in practice, but by adding more and more terms in the series we will get closer and closer to the value of 2. Try it on a calculator. In summary:

1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} +\ldots  = 2.

  • Working down the rows, the second row is the same as the first with the first term removed. So its sum must be 1. The third is the same as the second with the first term of 1/2 removed, so its sum is 1/2. By the same reasoning, the sum of the fourth row will be 1/4, the fifth row 1/8 and so on.
  • So, the row sums are respectively 2, 1, 1/2, 1/4, …. This is the same as the values of the first row with the additional first term of 2. It follows that the sum of the row sums, and therefore the sum of all numbers in the grid is 2+2=4. Again, we can’t add all the numbers in the practice, but we will get closer and closer to the value of 4 by adding more and more squares.
  • The total value of the squares inside the red square is 1 + 1/2 + 1/2 + 1/4 = 9/4. The total value outside this region must therefore be 2-9/4= 7/4.
  • Putting all this together, the initial value of the coins was 2. After any number of moves, the total value of all coins will always remain 2. But the total value of all squares outside the red square is only 7/4. It must therefore be impossible to remove the three coins from the red square because to do so would require the coins outside of this area to have a value of 2, which is greater than the total value available in the entire region.

I find this argument quite brilliant. My instincts were to try to solve the puzzle using arguments from geometry. I failed. It would never have occurred to me to try to develop a solution based on the properties of numbers.

As I wrote in the original post, this puzzle doesn’t really have any direct relevant to Statistics except in so much as it shows the power and beauty of mathematical proof, which is an essential part of statistical theory. Having said that, the idea of infinite limits is important in Statistics, and I’ll discuss this in a further post. Let me just mention though that summing infinite series as in the solution above is a delicate issue for at least two reasons:

  1. Although the sum 1 + 1/2 + 1/4 + 1/8 + …. has a finite sum of 2, this series 1 + 1/2 + 1/3 + 1/4 + 1/5 + …. has no finite sum. The sum grows very slowly, but as I take more and more numbers in the series, the sum grows without any limit. That’s to say, if you give me any number – say 1 million – I can always find enough terms in the series for the sum to be greater than that number.
  2. To get the total value of the grid, we first added the rows and then added these row sums across the columns. We could alternatively have first added the columns, and then added these columns sums across the rows and we’d have got the same answer. For this example both these alternatives are valid. But in general this interchange of row and column sums to get the total sum is not valid. Consider, for example, this infinite grid:

 The first row sums to 2, after which all other rows sum to zero. So, the sum of the row sums is 2. But looking at the columns, even column sums to zero. So, if we sum the columns and then sum these sums we get 0. This couldn’t possibly happen with finite grids, but infinite grids require a lot more care.

In a follow-up post we’ll consider limits of sums in the context of Statistics.

Finally, I’m grateful to for some follow-up discussion on the original post. In particular, we ended up discussing the following variation on the original puzzles. The rules are exactly the same as before, but the starting configuration of the coins is now as per the following diagram:

In this case, can the puzzle be solved? Does the argument presented for the original problem help in any way?

If you have any thoughts about this, please do write to me. In any case, I’ll write another post with the solution to this version shortly.

I’m a 20 cent coin, get me out of here


European 20 Cent Coins (front and back) isolated on white background

Usually when I’ve posted puzzles to the blog the’ve had a basis in Probability or Statistics. One exception to this was the mutilated chessboard puzzle, whose inclusion I justified  by pointing out that mathematical logic is an important strand in the theory that underpins Statistics. Definitely not the only strand, but important nonetheless.

In this same spirit, here’s another puzzle you might like to look at. I’ll give references to the author and so on when I write a follow-up post with the solution. But, if you think I should only be doing posts that are strictly Probability or Statistics related, please just ignore this post. It’s only related to Statistics in the same way that the mutilated chessboard puzzle was. Having said that, I will use follow-up discussion to this puzzle as a lead-in to some important statistical ideas.

Anyway, here’s the puzzle. Look at this grid of squares…

Actually, you have to imagine the grid extending indefinitely downwards and to the right. In the top left-hand corner of the grid you can see a 2-by-2 section of the grid that’s been marked with red lines, and 3 coins have been placed in that section. Your task is to remove the coins from that section by following these rules:

  1. Coins are removed one at a time.
  2. When you remove a coin you must replace it with two coins, one immediately below and one immediately to the right of the one that’s been removed.
  3. If a coin does not have a free space both immediately below and to the right, it cannot be removed until such space becomes available.

You have to find the sequence of moves that results in the section inside the red square being emptied of coins, or explain why it’s impossible to find such a sequence.

To make things easier, you can try the puzzle using this gif. Again, I’ll give credits and references for this work when I write the follow-up post.

To play just click on the green flag to start and then click successively on the coins you’d like to remove. You will only be allowed to remove coins according to the rules above, and when you do legally remove a coin, two new coins are automatically added, again according to the stated rules. If you just want to start over, press the green flag again.

(I don’t know what the red button is for, but DON’T PRESS IT).

So, can you find the sequence of moves that releases all of the coins from the red square? Or if it can’t be done can you explain why?

Please write to me if you want to discuss the puzzle or share your ideas. I’ll write a post with a solution shortly.



Proof reading

In an earlier post I described what’s generally known as the Mutilated Chessboard Puzzle. It goes like this: a chessboard has 2 diagonally opposite corners removed. The challenge is to cover the remaining 62 squares with 31 dominoes, each of which can cover 2 adjacent horizontal or vertical squares. Or, to show that such a coverage is impossible.

Several of you wrote to me about this, in many cases providing the correct solution. Thanks and congratulations to all of you.

The correct solution is that it is impossible to cover the remaining squares of the chessboard this way. But what’s interesting about this puzzle – to me at least – is what it illustrates about mathematical proof.

There would essentially be 2 ways to prove the impossibility of  a domino coverage. One way would be to enumerate every possible configuration of the 31 dominoes, and to show that none of these configurations covers the 62 remaining squares on the chessboard. But this takes a lot of time – there are many different ways of laying the dominoes on the chessboard.

The alternative approach is to ‘step back’ and try to reason logically why such a configuration is impossible. This approach won’t always work, but it’s often short and elegant when it does. And with the mutilated chessboard puzzle, it works beautifully…

When you place a domino on a chessboard it will cover 1 black and 1 red square (using the colours in the diagram above). So, 31 dominoes will cover 31 black and 31 red squares. But if you remove diagonally opposite corners from a chessboard, they will be of the same colour, so you’re left with either 32 black squares and 30 red, or vice versa. But you’re never left with 31 squares of each colour, which is the only pattern possible with 31 dominoes. So it’s impossible and the result is proved. Simply and beautifully.

As I mentioned in the previous post the scientific writer Cathy O’Neil cites having been shown this puzzle by her father at a young age as the trigger for her lifelong passion for mathematics. And maybe, even if you don’t have a passion for mathematics yourself, you can at least see why the elegance of this proof might trigger someone’s love for mathematics in the way it did for Cathy.

Having said all that, computer technology now makes proof by enumeration possible in situations where the number of configurations to check might be very large. But structured mathematical thinking is still often necessary to determine the parameters of the search. A good example of this is the well-known four colour theorem. This states that if you take any region that’s been divided into sub-regions – like a map divided into countries – then you only need four colours to shade the map in such a way that no adjacent regions have the same colour.

Here’a an example from the Wiki post:

You can see that, despite the complexity of the sub-regions, only 4 colours were needed to achieve a colouring in which no two adjacent regions have the same colour.

But how would you prove that any map of this type would require at most 4 colours? Ideally, as with the mutilated chessboard puzzle, you’d like a ‘stand back’ proof, based on pure logic. But so far no one has been able to find one. Equally, enumeration of all possible maps is clearly impossible – any region can be divided into subregions in infinitely many ways.

Yet a proof has been found which is a kind of hybrid of the ‘stand back’ and ‘enumeration’ approaches. First, a deep understanding of mathematical graphs was used to reduce the infinitely many possible regions to a finite number – actually, around 2000 – of maps to consider. That’s to say, it was shown that it’s not necessary to consider all possible regional mappings – if a 4-colour shading of a certain set of  2000ish different maps could be found, this would be enough to prove that such a shading existed for all possible maps. Then a computer algorithm was developed to search for a 4-colour shading for each of the identified 2000 or so maps.  Putting all of this together completed the proof that a 4-colour shading existed for any map, not just the ones included in the search.

Now, none of this is strictly Statistics, though Cathy O’Neil’s book that I referred to in the previous post is in the field of data science, which is at least a close neighbour of Statistics. But in any case, Statistics is built on a solid mathematical framework, and things that we’ve seen in previous posts like the Central Limit Theorem – the phenomenon by which the frequency distributions of many naturally occurring phenomena end up looking bell-shaped – are often based on the proof of a formal mathematical expression, which in some cases is as simple and elegant as that of the mutilated chessboard puzzle.

I’ll stop this thread here so as to avoid a puzzle overload, but I did want to mention that there is an extension of the Mutilated Chessboard Puzzle. Rather than removing 2 diagonally opposite corners, suppose I remove any 2 arbitrary squares, possibly adjacent, possibly not. In that case, can the remaining squares be covered by 31 dominoes?

If the 2 squares removed are of the same colour, the solution given above works equally well, so we know the problem can’t be solved in that case. But what if I remove one black and one red square? In that case, can the remaining squares be covered by the 31 dominoes:

  1. Always;
  2. Sometimes; or
  3. Never?

I already sent this problem to some of you who’d sent me a solution to the original problem. And I should give a special mention to  who provided a solution which is completely different to the standard textbook solution. Which illustrates another great thing about mathematics: there is often more than solution to the same problem. If you’d like to try this extension to the original problem, or discuss it with me, please drop me a line.



Happy Pi day

Oops, I almost missed the chance to wish you a happy Pi day. So, almost belatedly:

Happy Pi day!!!

You probably know that Pi – or more accurately, 𝜋 – is one of the most important numbers in mathematics, occurring in many surprising places.

Most people first come across 𝜋 at school, where you learn that it’s the ratio between the circumference and the diameter of any circle. But 𝜋 also crops up in almost every other area of mathematics as well, including Statistics.  In a future post I’ll give an example of this.

Meantime, why is today Pi day? Well, today is March 14th, or 3/14 if you’re American. And the approximate value of Pi is 3.14. more accurately, here’s the value of 𝜋 to 100 digits:

3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 7067

Not enough for you? You can get the first 100,000 digits here.

But that’s just a part of the story. You probably also know that Pi is what’s known as an irrational number, which means that its decimal representation is infinite and non-repeating. And today it was announced that Pi has just been computed to an accuracy of 31.4 trillion decimal digits, beating the previous most accurate computation by nearly 9 trillion digits.

That’s impressive computing power, obviously, but how about simply remembering the digits of 𝜋? Chances are you remembered from school the first three digits of 𝜋: 3, 1, 4. But the current world record for remembering the value of 𝜋 is 70,030 digits, which is held by Suresh Kumar Sharma of India? And almost as impressively, here’s an 11-year old kid who managed 2091 digits.

Like I say, I’ll write about 𝜋’s importance in Statistics in another post.

Meantime, here’s Homer:


Lucky, lucky 2019


Welcome back to Smartodds loves Statistics.

Let’s start the new year with a fun statistic:

2019 is a lucky, lucky year.

Why is that? Well, let’s start with prime numbers. You’ll know that a prime number is a whole number that can’t be written as a multiple of other whole numbers. For example 6 is not a prime number since 6 = 3 \times 2, but 7 is a prime since it can only be factorised as 7 = 7 \times 1.

One way of generating the prime numbers is as follows.

Start with the complete list of whole numbers excluding 1:

2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30….

The first number remaining is 2, so remove all multiples of 2 that are bigger than 2:

2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29…..

The second number remaining is 3, so remove all multiples of 3 that are bigger than 3:

2, 3, 5, 7, 11, 13, 17, 19, 23, 25, 29…..

The third number remaining is 5, so remove all multiples of 5 that are bigger than 5:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29…

And keep going this way. The numbers that remain are the prime numbers.

It’s easy to check that

2, 3, 5, 7, 11, 13, 17, 19, 23, 29

comprise all of the prime numbers that are smaller than 30. To get the bigger prime numbers you just have to apply more steps using the same procedure.

Lucky numbers are generated in much the same way. This time we start with the sequence of all positive whole numbers:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,…..

The second number is 2, so we remove every second number from the sequence, leaving

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, …..

The third number remaining is 5, so we remove every fifth number of the sequence

1, 3, 5, 7, 13, 15, 17, 19, …..

The fourth number remaining is 7, so we remove every 7th number.  

1, 3, 5, 7, 13, 15, 19, …..

And so on….

The numbers that remain in this procedure are said to be lucky numbers. And proceeding in this way, it’s easy to check that 2019 is a lucky number. But 2019 isn’t just ‘lucky’, it’s ‘lucky, lucky’. Every whole number can be written uniquely as a multiple of prime numbers. In the case of 2019 the unique prime factorisation is:

2019 = 3 \times 673

And… both 3 and 673 are also lucky numbers. So 2019 is doubly lucky in the sense that it is both lucky itself and all of its prime factors are lucky. Moreover, 2019 is the only year this century that has this property, so enjoy it while it lasts.

This post is really more about mathematics than statistics, but how about this? If I take a large number, 2020 say, and pick a number at random from 1 to 2020, what’s the probability that it will be a lucky number? One way to do this would be to identify all of the lucky numbers up to 2020. If there are m such numbers, then the probability a randomly selected number will be lucky is m/2020.  But it turns out there’s a good approximation that can be calculated very easily, and it works for any large number, not just 2020.

A classical result from number theory is that the probability that a randomly selected number in the sequence 1,2,…., N is a prime number, for any large value of N, is approximately


where  log is the logarithmic function. With N=2020, this is equal to 0.13, so there’s roughly a 13% chance that a number from 1 to 2020 is a prime number. But almost incredibly, this same approximation works also for lucky numbers, so there’s also roughly a 13% chance that a number from 1 to 2020 will be a lucky number. Obviously lucky, lucky numbers are much rarer, and I don’t know of any formula that can be used to calculate the probability of such numbers. The fact that there is just one lucky year this century, though, suggests the probability is pretty low.