After being distracted by football, I’m going back to my original plan to engage with some modestly technical questions about confidence intervals. This might be too in the weeds, but let me know in the comments what parts don’t make sense.
In the next three or four blogs, I want to take a technical turn to once again engage with one of my least favorite concepts: statistical intervals. Confidence intervals, credible intervals, prediction intervals. Everyone who engages with statistical data sort of knows what these things mean, but more often than not, they are presented and discussed in a state of confusion. For the past few years, I’ve been thinking about statistics with intentional probability as algorithms (e.g., here and here). And over the past week or so, I’ve been wondering if thinking about intervals as randomized algorithms would make them more or less confusing. Let’s find out!
Let’s start with the notion of a Monte Carlo Algorithm from computer science. A Monte Carlo Algorithm is given some means of collecting a sample from a probability distribution D. Its goal is to output some attribute of the distribution. Call the attribute A. Given the sample S from D, the Monte Carlo Algorithm does deterministic computation. It just runs some Excel macro or R-script without any randomness. Call the output of this computation C. We can summarize a Monte Carlo Algorithm with the following meta-procedure:
Sample S from D
Do some deterministic processing on S
Return C(S)
Every Monte Carlo Algorithm comes with the guarantee that if you run the procedure, C(S)=A with probability p. That is, the algorithm is correct with probability p. The probability here is with respect to the sampling distribution. This is a “frequentist” notion: If p=95%, and we ran the algorithm 1000 times, the answer would be correct approximately 950 times.
My favorite example of a Monte Carlo Algorithm is Stochastic Gradient Descent (SGD), a popular algorithm for training machine learning models. Let me work through a very simple example of SGD to highlight why it’s a Monte Carlo Algorithm. Say we want to find the point that minimizes the sum of three functions:
SGD would sample a sequence of the numbers (1,2,3) of some length N. Say
S = {2, 3, 1, 2, 3, 3, 2, 2, 3, …}
That’s step 1. For step 2, we need a deterministic processing. Starting with x0 = 0, for each i from 1 to N, we would get the ith element of S, j, and set
This process is entirely deterministic. Now, we would return the attribute A:
Under some assumptions about the functions f1,f2,f3, we can mathematically ensure A is true with probability at least 95%.
I won’t grind through the details of why SGD is correct today. I just want to use this example to belabor a point about probabilistic guarantees. How do we know we found a correct answer? That is, how do we know A is true? We need a validation step to confirm that we didn’t get unlucky and end up in the 5% of cases where the algorithm failed.
For SGD, if we care, we have a certificate: we can just compute the gradient of f at xn and verify that it is small. That is,
If it’s not small, we can run the algorithm again. The probability of not finding a good answer after 20 runs is less than one in a million.
There are many other examples of Monte Carlo algorithms. There are algorithms for testing whether or not a number is prime. Using random prime numbers, two parties can test if two large numbers are equal without sending the entire number. Many compression and sketching algorithms are Monte Carlo Algorithms. There are Monte Carlo Algorithms for graph cuts. There are Monte Carlo Algorithms for linear algebra.
The promise of all Monte Carlo Algorithms is that randomness saves you time (on average). By looking at samples, we can infer properties of the population we care about. And though our algorithm has a chance of failing, most Monte Carlo Algorithms come with some sort of routine to verify their correctness.
What does this have to do with statistical intervals? Well, all statistical intervals are the output of Monte Carlo algorithms. They all have the same guarantees as the algorithms I describe above. But they almost never come with a useful way to verify their correctness. I’m wondering if we can do better. With that in mind, tomorrow I’ll dive into confidence intervals, why they are Monte Carlo algorithms, and why they are ultimately a disappointing way to summarize information.
Great topic, Ben, looking forward to reading more!
But math typesetting in Substack makes me sad as in MS Word sad.
Perhaps this just illustrates my inexperience, but how do you know what is the value of p? 95% seems the default rather than verified. Maybe to put a different way, what happens when the distribution is nonnormal? Do confidence intervals for, say, skewed distribution differ from distributions with long tails, vs normal distribution? Asking as a radiologist.