Regulations, Rules, and RCTs.
Understanding the meaning of randomized trials through their use.
Yesterday, I discussed Neyman’s original view of the randomized controlled trial as a measurement device. This device uses randomized algorithms to probe differences in outcomes between two treatments. Neyman’s view of experimentation was shared by Gosset and Fisher. But things got muddled in the 1930s, and it’s not how we widely teach it today.
Part of why everyone got confused is that the purpose of the measurement in a randomized trial has a mixed message. What do you do with a measured difference in potential outcomes?
The haughty answer asserts you have determined causation. This, dear readers, is a mistake. If we are doing a field experiment the way Neyman intended, we are already pretty sure that our treatment causes something to happen. It’s not like we’re putting a new fertilizer on our field because we think it might spontaneously cause cows to emerge from the ground. We’re trying the new fertilizer because we think it’s going to cause the plants to grow. We just want to know if it’s better than the old fertilizer. We’re looking to measure an effect difference, not determine a cause.
Rather than dwelling in lugubrious confounding, moderating, and mediating, let’s think about what we do with measurements. If, like Neyman, we are testing fertilizers, we’re just trying to decide which fertilizer to use next season. We’re measuring to make a decision. The decision will be informed by what we measure.
To make a decision, we’ll need to come up with some rules. We’ll probably consider the cost of the new fertilizer. If it’s more expensive than the old one, how much more yield do we need to make the cost worthwhile? If it’s cheaper, can we afford less yield and still make a profit? We can use my uncertain measurement from the field experiment as the basis of a cost-benefit analysis.
Neyman emphasizes that a randomized trial not only lets you estimate the difference in effects of treatments, but it also lets you estimate the actual values of the treatments in each arm. Somehow, this observation has been lost over the century. In the field experiment, we can estimate the yields of each fertilizer. Such measurements come with almost the same precision as the treatment difference itself. Using these estimates, you could write up some 5-sigma intervals on the associated profits between the two fertilizers and use this to inform the decision.
With your 5-sigma error bars, you can compare the two fertilizers and get a range of possible revenue for the two fertilizers, all else being equal. And if we wanted, we could step back and think about other issues: the cost of switching fertilizers, the smell of the fertilizers, the ESG concerns. Lots of factors will go into the decision-making process. The measured treatment difference will only be one of them.
This is my way of arguing for a pragmatic version of using randomized decision theory. What I described is very much not the same as Neyman-Pearson decision theory (Sorry, Dr. Neyman, I was being so nice to you up until now). NP theory is beautiful mathematics, but it’s too loaded to be used on its own for most decision-making. NP’s cost requires specifying priors on hypotheses that are usually not what we care about (how would you frame H0 and H1 in the fertilizer problem?). And setting the priors is usually ad hoc. But these are secondary worries for me. Formulating policy as minimizing a single cost function is a myopic and dangerous way to make most decisions.
My pragmatic, decision-centric description of randomized controlled experiments is how they are most impactfully used today. Randomized experiments are more effective as part of regulatory mechanisms than as instruments of scientific discoveries. The FDA sets some rules (probably too leniently) about what burden of proof a drug company must demonstrate for a new drug. A tech company sets some threshold of lift for a new feature to demonstrate in an AB test before shipping the new code. Randomized trials are means of informing these policies. We only get confused about metaphysics when we use these trials for anything else.
Loving the substack so far. The decision-centric perspective you discuss here, and contrasting it with other perspectives on experiments, is a big focus of Part I of the new Wiggins & Jones book. I think you posted about the book on twitter a few months, but if you haven't yet dug into it, do!
> Randomized experiments are more effective as part of regulatory mechanisms than as instruments of scientific discoveries.
why?