Before we can make a prediction interval, we need to convince ourselves that prediction is possible. What data can we collect about the past to predict the future? We clearly need a model before we can do anything.
Conventional wisdom suggests that counts of the past can be converted into probabilities of the future.
16 out of 100 die rolls landed on a 6. Therefore, my chance of rolling a 6 with this die is 16%.
Steph Curry has made 257 of his 281 free throws this season. Therefore, the probability he’ll make a free throw in tomorrow’s game is 92%.
70 out of the 127 2-point attempts in football last year were successful. Therefore, the probability of converting the next 2-point conversion is 55%.
It rained in the morning three out of the last five days. Therefore, the probability it will rain this morning is 60%.
In clinical trials, 900 of the 1300 people who took Ozempic lost over 10% of their body weight in 5 years. Therefore, I have a 70% chance of losing over 10% of my body weight by taking Ozempic.
I don’t know if any of those examples make sense. But we do this sort of count-to-chance conversion all the time. We collect a bunch of events, count the number of times the events occur in the past, and then turn this frequency into a probability in the future. What do we need to be true for this conversion to be correct?
Intuitively, we’d like to assume that, all else being equal, each of these isolated events is identical in some capacity. We conceive each event to be a realization of the same process with some additional variability that we hope will average out if we collect enough data. Moreover, we think of these events as isolated from each other so that the order in which the events occur doesn’t matter at all. All that matters is frequencies. Finally, we have to assume that each event is “random” and hence mentally equivalent to some game in a casino.
We use a few standard models in statistics to capture these intuitions. The first is that the events are independent and identically distributed. Statisticians use this model so much that they abbreviate it without periods as iid (though it’s still pronounced “eye-eye-dee” not “eed” or “eyed”). Identically distributed means that the chances of different outcomes are the same for all of the individual observations. If we’re just going to naively turn frequencies into probabilities, there’s no way around this assumption.
Independence, on the other hand, might be too strong. Independent means that no two events have any probabilistic influence on each other. If we observe one event, this provides us with no information about what happens in another. The die rolls I describe above are arguably independent events. The weather, on the other hand, is likely not.
We can get around independence with a weaker assumption. Being free from dependence on order was one of the desiderata I listed. Why not just assume that one directly and see where it gets us? We say the sequence of events is exchangeable if the probability of the entire sequence is the same even if we shuffle the order of the events.
Exchangeable is not only saying that you can shuffle the past but that you can also shuffle the future with the past and not be surprised. If I have seven dice colored like the rainbow, the odds of 1,2,3,4,3,2,1 are the same if I roll in the order red, orange, yellow, green, blue, indigo, violet, or if I roll in the order indigo, orange, violet, red, yellow, green, blue. And if I told you I was going to roll four today and three tomorrow, the odds of the sequence would be the same as if I rolled all seven today.
Exchangeability feels like a much less restrictive assumption than iid. Any iid sequence is exchangeable. And I can certainly give you examples of distributions that are exchangeable but not iid. Here’s a simple example: take those seven colored dice and put them in one of our magic probabilistic urns. Now pull out a random die each and put it on the table, but don’t put it back in the urn. Keep drawing until you have all of the dice on the table. The sequence of colors you see is exchangeable: Any sequence is equally likely. But it is not iid. If the first die you draw is red, you know the probability of ever seeing red again is zero.
If a sequence is exchangeable, then each event in the sequence is identically distributed. Since order doesn’t matter, whatever you reason about the chances of outcomes for the second event in the sequence has to be the same as reasoning about the fifth. Exchangability seems to give us all I asked for. Order doesn’t matter. All events in isolation are the same. All events are random. For all intents and purposes, we can swap the future with the past. Exchangability seems like a minimal assumption for our count-to-chance conversion to work.
Such technicalities over minimal statistical assumptions are almost always academic. Are patients in clinical trials independent or exchangeable? Are three-point shots in basketball independent or exchangeable? What about three-point attempts by Caitlin Clark in a single game? Are these independent? Are they exchangeable? Are they random?
Meh, who knows? Tomorrow, I will put aside my disdain for natural randomness and grant that some processes might be iid or exchangeable. Holding my nose, I’m going to try to understand how to build prediction intervals assuming such models are true.
Thanks for another insightful post. Is "sampling without replacement" part of the assumptions (something we choose)? Or is rather a given as a constraint (something we have to work with)?