Statistical Fatalism
The tools of statistical optimization can't imagine a different future.
When Moritz and I conceived the structure of our machine learning class, we wanted really hammer in on the problems of time, feedback, and nonstationarity. But when assembling course content around our skeletal outline, we kept finding that existing toolkits for understanding the impact of actions in statistical systems skirted the problem entirely. To explain, let me start where we start in our book: causal inference, as narrowly defined in the statistics of the social sciences. Even in a fully design-based view, there is a fundamentally flawed axiom that is so subtly stated that you don’t see it until you’ve read 100 econometrics papers.
Recall the setup of a randomized experiment for the twentieth time this semester. We imagine a population in which each unit has a predetermined outcome under each treatment. We thus experiment on a sample of the units and then apply the better treatment to the remainder of the population. We estimate the efficacy of this procedure by comparing to a scheme that knew the best single treatment in advance. This comparison explicitly assumes that the meaning of the treatment in the experiment is the same as its meaning after the experiment. It means that every action has no effect on future outcomes. This is unfortunately seldom true.
A prime example is in cancer screening trials. Since you can’t force someone to have a mammogram or colonoscopy, the treatment in a cancer trial is the offer of a cancer screen. This seems fine because we think that a patient-doctor discussion of a cancer screen has no impact on cancer outcomes. The treatment is the intention to treat, not the treatment itself. Nonetheless, if the trial manages to find a p-value less than 0.05 somewhere, editorials will conclude that screening prevents cancer, even though the treatment was an offer, not a screen. Influenced by key opinion leaders, the public, including doctors and potential patients, concludes that screens prevent cancer. The pressure to screen (from both doctors and patients) increases. After this cascade of post-trial information, the offering of a cancer screen, the original treatment, means something different. People who were previously skeptical of screening may be less skeptical after the media blitz. The trial didn’t measure the impact of screening on such medically hesitant individuals and therefore doesn’t provide an estimate of the treatment’s impact after the trial.
The assumption that actions have no impact is what Phil Dawid calls fatalism. The outcome of any unit under each potential treatment is fixed at the dawn of time, and this relationship is unaffected by any treatment applied to other units. This is an odd place to start from if we think our actions should be consequential.
Dawid defines fatalism to specifically call out the popular schools of causal inference. The example I gave above shows casual inference’s gold standard, the randomized controlled trial, is typically modeled fatalistically.
Analyzing a single step in a simple RCT reveals a surprising well of complexity and many headaches for the policymaker. It’s much easier to build up a framework for approving interventions than to imagine what will happen if those interventions are applied at a population scale.
Fatalism assumes the absence of temporal dynamics. The meaning of treatments can’t change over time. It means your policy has no effect beyond the treatment of each unit in isolation. People will behave the same before you make a policy and after you make a policy. Most people who work on causal inference know none of this is true, of course. And any seasoned machine learning engineer knows this as well when maintaining systems to continually retrain their stable of prediction models.
Dawid is of course not the first person to identify this problem. Fifty years ago, economist Robert Lucas pointed out that you can’t use historical data to predict the impact of economic policy because of feedback effects.1 Here’s the direct quote from Lucas, which features prominently on the Wikipedia:
“Given that the structure of an econometric model consists of optimal decision rules of economic agents, and that optimal decision rules vary systematically with changes in the structure of series relevant to the decision maker, it follows that any change in policy will systematically alter the structure of econometric models.”
You develop a model based on historical data to inform policy. When you implement the policy, the future data arises from a different world. Unless you can model how people will react to your policy change, you can’t predict its outcome. You’ll need to construct a more complex model of policy impacts. However, there is no way to collect reliable behavioral data to model people’s reactions at the macroeconomic scale. There’s no fix for this critique.
Economists love to tell you about the Lucas Critique and how it’s vital to understand if you want to think deeply about economic policy. They then go back to their office to publish papers built upon the fatalistic toolkit of causal inference.
And so we see ourselves in academic endeavors trapped by our weird optimization frame. We all know our toolkits trap us into a particular kind of invalid inquiry, but we’re unable to effectively communicate what else we should do. We muddle through using techniques we know are flawed, and end up studying only the small set of problems our techniques can engage with. That’s a fascinating feedback loop.
Variants of Lucas’ critique were levied by other economists even earlier. We’ve known this is a problem for a long time.


Aren't you arguing that the "perfect" should undermine the "good"?
I am currently reading Judea Pearl's "The Book of Why". The case of tobacco and lung cancer also addresses the issue of not being able to do a RCT on tobacco use. Yet they do a historical analysis of cancer sufferers and the history of tobacco use that shows the relationship.
Pearl makes a very good case for causality. AFAICS, the difficulty is the generalized building of models for all cases, rather than a few for certain specific cases of study. It is the bespoke expert system model vs ML of rules-based systems like decision trees, that can be applied to any tabular data with known outcomes for each event.
Shouldn't the mantra be: "Imperfect accuracy [of the model] is better than being precisely wrong"?
TL; DR; I am a huge fan of ML and sequential design under performative settings. We need new instances of it more than ever in drug development and neurotech. It is most certainly not taken into account in evaluating any brain stimulation or neuromodulation trials.
> Analyzing a single step in a simple RCT reveals a surprising well of complexity and many headaches for the policymaker. It’s much easier to build up a framework for approving interventions than to imagine what will happen if those interventions are applied at a population scale.
This might be very much on the implementation side but well before Pearl developed transportability, people started to think about other kinds of non-ideal experiments that have greater external validity than RCTs. The high internal validity RCT is there to establish whether a benefit even exists, while pragmatic trials measure the real world performance.
https://rethinkingclinicaltrials.org/chapters/design/experimental-designs-and-randomization-schemes/experimental-designs-introduction/
We could have a philosophical discussion of why one might ever want to care about internal validity at all. Do ideal well controlled experiments matter when what one really cares about is real world performance? Nancy Cartwright's account of causality emphasizes the modularity of the real world. Some things just go together in nature and it doesn't make sense to ask what the causal effect of intervening on them separately even means. For therapeutic development, it makes sense to triangulate evidence across different research designs each of which trade off internal and external validity to cover some threat to the validity of the final scientific question.
> Fatalism assumes the absence of temporal dynamics. The meaning of treatments can’t change over time. It means your policy has no effect beyond the treatment of each unit in isolation. People will behave the same before you make a policy and after you make a policy. Most people who work on causal inference know none of this is true, of course. And any seasoned machine learning engineer knows this as well when maintaining systems to continually retrain their stable of prediction models.
In vaccine research, obesity, social network interventions, the violation of interference between units is very much a concern. Still generally an underrated topic https://arxiv.org/pdf/1403.1239