Monday, April 30, 2018

Which is harder to beat, an old record or a new one? The old one hasn't been beaten for a long time, so is obviously difficult to beat. But if it had been beaten recently, then that new record would have been even harder to beat, wouldn't it?

Let's make a probabilistic model, but first recall this concept called independence. Suppose we flip a coin twice. The probability of heads is $1/2$ each time, so the probability of heads-heads is $1/2 \cdot 1/2 = 1/4$. This is because the two coin flips are independent. If we get information about one of them, it doesn't affect what we believe about the other.

Suppose instead we throw an ordinary die once. The probability of a high number, 4, 5 or 6, is $1/2$. The probability of an even number, 2, 4, or 6, is also $1/2$. But the probability of a number that's high and even isn't $1/4$. It's $1/3$. The two events aren't independent. Among the high numbers there are two even and only one odd, so if you get the information that the number was high, it becomes more likely that it was also even. The two events are positively correlated, meaning they tend to occur together. If we know that one of them happened, it will increase the probability of the other one.

Now back to the records. Consider the following card game: Shuffle an ordinary deck and put three cards in a row face down on the table. Then turn them up one by one, from left to right. Let's say that a record is a card that's highest so far. The first card is always a record, the second is a record if it beats the first, and the third is a record if it's highest of the three. Suppose also that we decide on some ordering of the suits to break ties, so that of two cards, one is always higher than the other.

What is the correlation between the events that the second cards is a record, and that the third card is a record? Do those events tend to occur together or not?

To figure that out, let's pretend we know what the first card is. We can imagine it's a seven, for instance. Then if the second card is not a record, the third card only has to beat that seven in order to be a record. But if the second card is a record, the third card will have to beat that card, which is higher, in order to be a record. So if we get the information that the second card is a record, the probability that the third card is a record will decrease.

And it doesn't matter for this argument whether the first card was a seven or some other card. If the first card was a three, then again there is negative correlation: If the second card is not a record, we're pretty sure the third card will be, whereas if the second card is a record, the third one may or may not be. Same thing if the first card was a queen: If the second card beats the queen, the probability that the third one is a new record will decrease.

So we conclude that the events are negatively correlated: Whenever we learn that one of them happens, the probability of the other one decreases.

Now let's look at the same thing from a different perspective. Suppose we turn up the second card before the first. And let's imagine that it is, well, whatever. A ten say. Now it's the other way around: If the second card is a record, that means that the first card is something smaller than a ten, so that the third card only has to beat the ten in order to be a record. But if the second card is not a record, the first card must be something higher, and the probability that the third is a record decreases.

So now it seems that the events are positively correlated: When we learn that one of them happens, the probability of the other increases. And again it doesn't matter if that second card was a ten or something else.

It's not that I don't know which of these two arguments is correct. Obviously none of them is. What bothers me a bit is that I may have thought, in some different context, that a similar argument was correct just because it led to what I knew was the right conclusion.

Finally let's reveal the answer about that correlation between records. The truth is that the second and  third card are records independently of each other! To see why, imagine we know what the third card is. Whether or not the second card is a record now only depends on the order of the first two cards, and that's clearly independent of whether or not they are both smaller than that third card!

Wait...