Let’s kick off 2025 at argmin with some long form. I just finished a commentary for the journal Observational Studies on what I call Bureaucratic Statistics, synthesizing several themes I’ve been working through on this blog. Statistics is a vast field and has many applications. We use it for visualization, giving us guidelines for summarizing large data sets with small, legible, numerical factoids. We use it for prediction, giving us algorithms to predict the future based on the belief that future events will look like past events. This piece focuses on less heralded application: we use statistics for rulemaking.
Statistical testing and inference are ubiquitously employed by systems of regulation. Consider the crown jewel of statistics, the randomized controlled trial (RCT). People love to tell you that RCTs are the “gold standard” of causal inference, but causation is almost always a secondary concern when RCTs are used.
“The biggest success of the RCT has been in drug trials, providing evidentiary guidelines for pharmaceutical companies to demonstrate that their drugs are safe and effective. Randomized clinical trials have been more generally applied in medicine to establish guidelines for standard of care. More recently, technology companies have widely adopted RCTs to evaluate whether to deploy new software features. And controversially, statistical tests have transformed the publication of scientific literature, providing folk standards for peer review.”
“The RCT primarily serves as a mechanism of regulation, regulating which drugs come to market, which clinical practices become standard of care, which software goes into production, and which academic papers prestige journals publish.”
RCTs facilitate rulemaking. The field of Statistics sells itself short by fighting about “inference,” “truth,” and epistemology rather than embracing this fundamental role in bureaucratic governance.
“From its inception, Statistics has been the mathematics of bureaucracy. It provides a numerical foundation for governance by clear, transparent, aggregate rules. Statistics helps governments measure what experts on the ground see and create reasonable metrics for consensus to move forward with policy.”
My commentary is less of a “call to arms” and more “calling a spade a spade.” I am not advocating for new methods but rather arguing that statistical testing and causal inference are not only used in policymaking but far better suited for ex ante policy than ex post inference.
What do I mean by ex ante policy? All of the theorems we prove about causal inference or statistical tests don’t tell us anything about “inference” or truth. They are not designed for such affairs. The theoretical guarantees only hold before data is collected. Theorems about p-values and confidence intervals are all ex ante. So are the theorems about Bayesian decision making. They say, “We guarantee randomness probably won’t screw up your measurement or plan of action.” They don’t tell you the probability of your theory being “true” as a function of what you measure.
This ex ante/ex post distinction helps clear up much of the confusion surrounding statistical testing and decision making. We agree in advance that there’s a chance our measurement is wrong. If we’re unhappy with that, we can make the sample larger, propose multiple tests, or design elaborate procedural preregistration checklists to assuage doubt. The ex ante guarantees of statistics help us design regulatory procedures.
Being clear about what we’re doing demystifies the process of evaluation. We use statistical tests for rulemaking! If all this math is doing is generating fancy tools that are part of an elaborate game, let’s say so.
“Statistical tests constrain outcomes in participatory systems. Engineers want to push features to get promoted; data science teams insist on AB tests to ensure these features don’t harm key metrics. Drug companies want to make a ton of money; clinical trials ensure drugs aren’t harmful and have a chance of being beneficial. Academics want to publish as many papers as possible to get their h-index to the moon; journals insist on some NHSTs to placate editors. The purpose of statistical tests is regulation.”
Since these rules are all artificial, there can’t be a single “right” way to evaluate interventions. All algorithmic rulemaking is necessarily value-laden and flawed. But I believe in participatory decision making. A key feature of rules is that they are changeable. Rules can be adapted so that processes better align with what we value. One redeeming aspect of bureaucracy is its potential to get stakeholders to agree to the game by articulating the rules cleanly. Calling statistical methods “regulations” rather than “rituals” (cf. Gigerenzer) centers our agency in using and changing them.
And people do propose changes with the understanding that statistical methods impact people. This paper by Abhijit Banerjee and collaborators looks at the various tradeoffs policymakers must grapple with when designing statistical evaluations. Another recent paper by Flora Shi and collaborators focuses on the tension of statistical outcomes for different parties. I’m not yet sure I endorse these, but I like that they are proposing solutions with the impacted people in mind!
Read the whole thing here! I look forward to your feedback. To fend off some expected criticism, I want to emphasize that I don’t propose any new solutions. Partially, this is because I’m not a fan of policy in general. Policy is at best paternalistic and at worst authoritarian. We can’t “mechanism design” our way to utopia. But we can be honest about what we are doing and how our work impacts people. Maybe I have to find a way to work in at least a bit more criticism in follow-up work.
I do promise some follow-up. Though there’s only one equation, this piece is still technical as it grapples with a dense academic literature on the philosophy of statistics. At some point this year, I intend to write something on this theme in more accessible prose.
If you're looking for an etymological connection between bureaucracy and Statistics, the word "state" is right there in the name! IIUC, etymologically "statistic" originally meant "having to do with the state", with "statistics" becoming shorthand for "statistic data" or "facts relevant to the state". (Which depending on whom you ask is pretty much all of them, hence the semantic broadening I guess.)
Great write up. If you haven't read it I can highly recommend Theodore Porter's "Trust in Numbers." It (among other things) compares the engineering statistics developed in France, where elites make all the decisions and thus generally just wanted an way to evaluate what to do, with engineering statistics developed in the US, where democratic (well, congressional) oversight meant that the engineers needed a way to make their assessments hard to question by politicians. I think you would enjoy it based on this post.