Staging Interventions

Actions are fundamentally different than predictions, but it's hard to write this distinction in math.

Nov 06, 2025

This is a live blog of Lecture 18 of the 2025 edition of my graduate machine learning class “Patterns, Predictions, and Actions.” A Table of Contents is here.

I’m surprised I didn’t trigger any of my statistical interlocutors on Tuesday with my post about randomized trials. Was I trolling? Maybe a little. I was hoping I’d get scolded about confounding. Because while it feels like you can predict the outcome of actions just like any other pattern, there really is a fundamental wrinkle introduced when we bring actions into the machine learning story.

Our base assumption in machine learning prediction is that data in the past remains the same as data in the future. That is, machine learning models prediction as a fundamentally passive act. Predictions don’t change processes.

But if you implement a new policy based on past data, your future data will be decidedly different. I’m not a big fan of the term, but we can say that actions induce a “distribution shift,” if you will. Once you accept that probabilities are fundamentally changed once you act, the floor sort of falls out of all of our beautiful mathematical decision theory. I’m sure someone figured this out before him, but this is commonly called the Lucas critique of statistical modeling.

So this lecture is a pause before we get into reinforcement learning, where these problems are even worse. Policies change the future. You have to account for these changes when using predictive modeling. Using retrospective data where you didn’t explicitly intervene is always going to be problematic, not just because of confounding, but because the data changes when you have intention. As we move from predictions to actions, we move from passive observation to active meddling. This means we need to figure out the impact of feedback.

Though not typically presented this way, feedback is the lens through which I view biases in retrospective analyses. If we have a complex system in which we observe two variables and hope that changing one changes the other, we need stronger evidence than simple covariation. If you have a retrospective list of actions and outcomes, you can build a prediction of an outcome under treatment and an outcome under no treatment. But we’re usually not happy with this. We say, “That’s just association.”

Using a famous example, you can look at past data and assume red wine lowers the incidence of heart disease. But if everyone started drinking red wine, I personally doubt the incidence would drop further. This is an example of confounding, in which unmodeled factors influence both the treatment (red wine consumption) and the outcome (heart disease). For example, socioeconomic status might raise the likelihood of drinking fine wine and lower the incidence of heart disease because of access to better preventative interventions.

You could teach an entire course on the biases that can creep into the statistical analyses of interventions—Simpson’s Paradox, Berkson’s Paradox, Milton Friedman’s Thermostat, and so on. But I hope to only spend part of the lecture on this. Because retrospective analyses are not the only sorts of studies subject to bias. Randomized controlled trials have their own set of issues, especially when they are not fully blinded.

Because of these biases, I want to spend a little time at least banging my drum about bureaucratic statistics. Though we won’t be doing much on this for the rest of the class, it’s worth noting that most applications of randomized trials occur in regulatory settings. RCTs serve as approval mechanisms for policy changes. RCTs also just measure associations, but these associations are still useful for policymaking.

Mathematically modeling decisions that change the future is the big missing piece in the Meehlian problem of statistical vs clinical judgment. Sure, statistical tabulation is better at prediction than people. But when you broadly implement actuarial policies, you change the distribution and move from snapshot to process.

Modern statistics is ill-equipped to deal with process, but we don’t have a clear alternative class to offer. Once you become obsessed with the problem of process, you either become a complex systems nut or a cybernetics nut, and no one listens to either.1 Kevin Munger calls process theory an “antimeme,” an idea so incongruous with our common discourse that it can’t spread. Whereas people love to argue about statistics, as soon as we get stuck reasoning about process, everyone gets confused and has a hard time talking to each other.

Though we’re not going to get too deep into it this semester, today’s discussion foreshadows a course I’m going to teach in the spring about machine learning, dynamics, and control. I wrote a blog series about this a year and a half ago, and I want to spend a semester fleshing out these ideas, connecting them to concepts in stochastic optimization, dynamic programming, and feedback theory. My goal is to decrypt some of this language about process, causation, and pragmatism. You’ll get to watch the process of attempting (and probably failing) to turn an antimeme into a meme.

I’m lumping the control theorists mostly in bucket 2, even though some definitely hang out in bucket 1.

arg min

Discussion about this post