Fractions or the laws of nature?

I am tapping the sign: conditional probabilities are not interventions.

Nov 01, 2023

In class yesterday, we discussed Simpson’s and Berkson’s paradoxes. I love both of these because though they are sold as paradoxes of probability, they are only paradoxes about fractions. One never need invoke measure theory, frequentist speculation, or Bayesian religion to observe bizarre behaviors of how counting proportions in two different ways lets you tell two different stories.

At the heart of the confusion is the notion of conditional probability. Conditional probability feels like an action. We even talk about it this way. Statisticians causally discuss how they “condition on variable x” as if this imperatively fixes x in stone so that we can analyze other variables. But from the most naive version of probability, where probability is just fractions, conditional probability is just a measure of relative fractions.

Let’s say I’m looking through my son’s giant collection of Pokemon cards. I can ask, “What is the probability of a Charmander in this box if I know it only contains fire-type Pokemon?” What I typically mean by this probability question is, “What fraction of the fire-type Pokemon cards are Charmanders?” I can relate these two notions by the definition of conditional probability:

\(\mathrm{Pr}[\text{charmander}~|~\text{fire}] = \frac{\text{number of charmanders}}{\text{number of fire-type cards}}\)

It is the relative proportion of Charmanders in the subgroup of fire-type cards. There’s nothing active here. I didn’t force the cards to all be fire-type. I just partitioned out the subset of fire-type cards and counted the number of Charmanders in that pile. We casually use “probability” when we mean “fraction,” and this gets us confused. Conditional probability is not an intervention.

But the laws of relative fractions are confusing. Most of the issues we run into are because this is not how we add fractions:

\(\frac{a}{b} + \frac{c}{d} \neq \frac{a+b}{d+c}\)

Attesting to its confusing nature, once we decide that fractions are probabilities, we call elementary school facts “paradoxes.”

Simpson’s Paradox is precisely the fact that fractions don’t add the way they should. We can consider the following table where we compare two math students who have selectively solved certain types of problems on their homework.

By fraction alone, Cameron is more often correct on either algebra problems or geometry problems. However, when we pool the problems together, Alex has a higher correct answer rate. Is this really such a paradox? I think that language might be a bit overblown here. But it highlights how rates can be deceptive and misleading when we don’t know denominators.

Berkson’s Paradox (also known as collider bias) is the one that confuses far more people. Myself included! Every time I look at this story, I get confused. I wonder how many false inferences we’ve reinforced because we didn’t account for collider bias. Berkson’s paradox is in many ways more universal than Simpson’s Paradox. Simpson’s Paradox requires me to think carefully about how to get a fractional addition to be imbalanced, but Berkson’s paradox happens every time we select a subset. Let’s just look at this partitioned square.

Here, we have two strips A (red) and B (blue). Their intersection is colored in purple. Berkson’s paradox arises because the relative proportions of A and B inside the multicolored strip are larger than the proportion of A and B inside of the entire square. Let L denote the colored L-shaped region that is the union of the rectangles A and B. I’ve marked the lengths of these regions so that the square has sides of length 1, A is p by 1, and B is q by 1. On its own, A has area p. But “conditioned on L,” the area of A is larger:

\(\mathrm{Pr}[A | L] = \frac{\text{area}(A)}{{\text{area}(L)}} = \frac{p}{p+q-pq}\)

That’s sort of weird. Hold on, we can get even weirder stuff to happen. Let’s compute more conditional probabilities. First, let’s not “condition on L”:

\(\mathrm{Pr}[A | B] = \frac{\text{area}(A~\text{intersect}~B)}{{\text{area}(B)}} = p\)

\(\mathrm{Pr}[A | \text{not}~B] = \frac{\text{area}(A~\text{but not}~B)}{{\text{area}(B)}} = p\)

By this calculation, the conditional probability of A does not change as we vary the condition B. But once we condition on L, we see a different picture:

\(\mathrm{Pr}[A | B, L] = \frac{\text{area}(A~\text{intersect}~B)}{{\text{area}(B)}} = p\)

\(\mathrm{Pr}[A | \text{not}~B,L] = \frac{\text{area}(A~\text{but not}~B)}{{\text{area}(\text{not}~B)}} = 1\)

What happened in that last expression? When we restrict to L, the areas of A and B do not change as they are subsets of L. But the only points that are in L and not B are those in A that are not in B. That is, the “conditioned on” not B and L, the probability of being in A is 1. This means that if we did a naive association study, we’d find that without selecting L,

\(\mathrm{Pr}[A|B]-\mathrm{Pr}[A|\text{not}~B] = 0\)

but when we select a study group based on L,

\(\mathrm{Pr}[A|B,L]-\mathrm{Pr}[A|\text{not}~B,L] = p-1\)

If we jump from thinking of these as fractions to thinking of these as some natural laws of chance dealt by nature, we can come to bizarre conclusions that B causes A the rate of A to be lower. As I mentioned yesterday, this led some doctors to believe that smoking prevents severe COVID because they only looked at sick patients in the hospital. But Berkson’s Paradox shows that it is hard to distinguish conventional wisdom from statistical illusion. Is it true that athletic ability is negatively correlated with mathematical ability? Or is this a downstream effect of reinforced collider bias? Conflating relative proportion with natural law is not just confusing. It can lead us to make terrible decisions.