The national academy of spaghetti on the wall
Randomized policy optimization with genetic algorithms. AKA Science.
The randomized experiment provides a clever way to determine the impact of a single action. But at some point, we need to think beyond binary actions and connect the individual binary tests to more complex systems and policies. Today, let’s loop back and think about how to apply randomized experiments to the policy optimization problem.
A policy is an intricate connection of data processing rules. Policies summarize a situation and declare an appropriate action toward a desired outcome. In our complex and ever-changing world, finding the best policy for all situations is, of course, impossible. But what if we just try to improve the policies that we already have? If we can find a small change to our current policy prescriptions that improves some outcome, we could change our policy to reflect this improvement. If we continually search for these improvements and work hard to demonstrate their value, we may head in a better direction over time.
Think about your annual visit to the optometrist. To fit your prescription, your optometrist will try a random direction to change the current prescription with their fancy machine. If the letters look better, you’ll take the new setting. If worse, you go back to where you started. You’ll find a perfect prescription for new glasses in a few minutes.
For more broad policies, we could perhaps gauge “better” or “worse” using randomized experiments. Imagine applying this iterative improvement scheme to the standard of care in medicine. An intrepid physician could propose some random new surgery for a niche condition. They would gather a bunch of patients, carefully chosen to be representative. They could assign them at random to treatment and control. They could carefully measure the outcomes after the surgery. If they find outcomes improved with p<0.05, they can publish a paper in the New England Journal of Medicine and lobby their friends in advisory boards about their new amazing procedure. Then the new surgery would be adopted as standard of care.
Is this that far off from what we do? I’d argue no. And conceptually, it’s not that ridiculous of an idea. In the mathematical theory of policy optimization, the idea of iteratively randomly testing improvements can crawl toward optimality. This algorithm is, at worst, a genetic algorithm or evolutionary strategy. I prefer the term pure random search as proposed by Rastrigin in 1963. As we’ll see in a few weeks, a good chunk of what people call reinforcement learning is just pure random search dressed up in confusing terminology and mathematical formulae.
Even though it looks slow and inefficient, it’s not too hard to convince yourself that this sequence of random randomized experiments will converge to a local optimum. Folks in the online learning space have even computed convergence rates for this method. They have even shown that if you make a bunch of mistakes (by, say, accepting a bunch of false positives because your rejection threshold is too low), the algorithm eventually compensates for the bad decisions.
For a lone investigator in a garage, this sequence of random random experiments is not the most efficient way to make progress. But what if I had a million copies of me all competing to find the best policy improvement? All million clones in their own garages could independently try a bunch of their own random ideas and then blog about their results. Sometimes the results would be so good that the herd of experimenting drones would all swarm around the good idea in the hope of striking gold on a nearby improvement, bringing home bragging rights. To some, this looks like an inefficient but massively parallel genetic algorithm. To others, it looks like science.
And, again, I’m not even trolling. If we loop back to frictionless reproducibility, we might attribute the rapid progress in machine learning to the ease of running small policy improvements on clearly defined metrics. I think there is more truth to this than not. I’d argue that fields in the throes of reproduction crises worry more about data and code sharing than forced, arbitrary epistemic rigor.
There are downsides to swarming randomized experimentation. It can and certainly does get stuck at local optima. The unfortunate experience shared by every bench scientist and every policy wonk is that most interventions simply do not work. But people are stubborn, prideful, and reluctant to admit defeat. Hence, scientific communities can chase illusory advantages for far longer than might seem reasonable.
In the human-facing sciences, the random random experiment paradigm also runs into ethical roadblocks. Endless, mindless experiments on human populations are not feasible nor ethical. If we’re going to experiment in a medical context, there had better be equipoise. There must be a significant disagreement about whether an intervention has a beneficial effect. And as we move into fuzzier spaces like development economics, with incredibly weak interventions, outcomes that defy quantification, and power calculations calling for millions of subjects, perhaps we have moved outside the useful scope of the great genetic algorithm of science. For many societal problems, we should agree to settle for other means of sensemaking beyond mindless datafication. Controlled experiments are powerful. But they are not the only means of making sense of the world.