Learning From the Mess
What the vitamin story tells us about reproducibility, discovery, and human nature.
This is Part 6 (of 7!) of a blogged essay “Steampunk Data Science.” A table of contents is here.
Having followed the tumultuous thirty-year journey from Eijkman’s chickens to Davis’ rats, let’s return to where we started: the question of reproducible, rigorous research. If there was an era of gold standard reproducible research, the early 20th century wasn’t it. I described multiple examples of important work that not only weren’t reproducible but were flat-out wrong.
McCollum’s first paper on nutrition couldn’t have been more wrong. Led astray by the work of Pavlov, McCollum was convinced that “the psychic influence of palatability is one of the most important factors in nutrition.” McCollum considered the possibility of vitamins, which he described as “certain organic complexes in the food given, which the body was not able to supply through its synthetic power from the materials at hand,” to be completely ruled out by his experiments. He would completely change his mind in the course of only a couple of years.
Was this paper published by McCollum bad for the scientific discovery of vitamins? The reality is the opposite. The fact that McCollum was dead wrong inspired further investigations, and the breakthroughs occurred in figuring out why he was wrong. Mendel’s team at Yale was inspired by McCollum’s synthetic diets and intrigued by his findings on palatability. In their replication attempts, they not only disproved McCollum’s hypothesis but also strengthened the case for the existence of essential amino acids. This work by Mendel’s team subsequently inspired McCollum and Davis’ investigations into the extracts of milk and egg yolks, resulting in their discovery of Vitamin A.
At the opposite end of the spectrum was German physician Wilhelm Stepp, who had conducted experiments removing the ether-soluble contents of bread and feeding them to mice. He claimed that after removing the ether-soluble contents, the mice quickly perished. When the ether-soluble materials were added as a supplement, the mice thrived. This sounds a lot like McCollum and Davis’ experimental setup, but Stepp’s results were deemed “far from conclusive” by Mendel and Osborne. His data was fishy. George Wolf and Kenneth Carpenter reanalyzed Stepp’s experiments from our contemporary understanding and found Stepp’s mice died far too quickly for the cause to be Vitamin A deficiency. What exactly Stepp had done remained unclear, and his work was not reproducible. But he had the right answer! He was clearly on the right track to finding Vitamin A.
We can and do learn from a lack of reproduction. Failure to reproduce tells us something about why our earlier assumptions were wrong, and digging into reproduction failures leads us to new discoveries. Nothing in the evidence points to malicious fraud or scientific misbehavior by those involved in the search for vitamins. It’s not clear how many of these errors would have been corrected by better statistical or scientific methodology. We should not expect science to be perfect and should be open to learning from mess.
And what about rigorous tools and research practices? Might these have accelerated our understanding of nutrition? Here, the evidence again points to no. The discovery of vitamins required a remarkably diverse set of investigative tools. As is always the case, well-controlled experiments designed to deliberately refute hypotheses were only one of many methods used to generate evidence. Natural experiments, such as Eijkman’s work with chickens and Vorderman’s prison observations, provided the initial clues that brown rice contained essential vitamins. The case studies initiated by Mendel, Osborne, and Ferry were experiments on single animals. They applied varied interventions over the course of the rat’s life, probing varied inputs into its diet, scouring for clues as they compared to a baseline. The individual case series of McCollum and Davis provided the definitive evidence that a simple organic compound could start and stop growth. Each of these methods provided a piece of the puzzle, but the researchers were learning how to do nutrition research as they went.
And though it was clear to Christiaan Eijkman that white and brown rice were different from each other, it took thirty years for that difference to be given a name. Sure, we can point back to single experiments that are as clear as day in hindsight, but the vitamins weren’t “discovered” until they were named. It took Funk’s bold survey article to name the problem (deficiency diseases) and the cure (vitamines). Only after Funk did everyone converge on the answer. The clean articulation of the problem and solution, of the cause and the effect, marked the actual discovery.
Perhaps the only pattern I can extract from the scientific processes here is that everyone involved was driven by a definitive purpose. The discovery of vitamins arose out of a deliberate, interventionist mindset. The researchers in Wisconsin wanted to identify the best diet for raising cattle. The researchers in the Dutch colonies sought a cure for beriberi. Nutrition research wasn’t aimed at breaking down the world for understanding, but rather at identifying interventions. They were trying to figure out cause and effect so that they could do something. The entire purpose was intervening, whether to save farmers money or cure terrible diseases. In finding what worked for their problems, they also discovered new chemistry and biology.
Would vitamins have been discovered sooner if the nutrition scientists had a more rigorous set of scientific tools? Could we imagine a counterfactual acceleration had they had access to computers loaded with spreadsheets and statistical software? We could also ask this question differently without assuming the present was wiser than the past. Was this discovery made possible by a rigorous practice that we can learn from?
The vitamin saga suggests the answer to all of these questions is probably no! If anything, the confines of scientific rigor of the day, like Koch’s rigid postulates for determining disease etiology, would have left us stuck with germ theory. I worry that contemporary quests for standardization and formalization of research practice lose sight of the value of creative experimentation and investigation. We can’t let reproducibility checklists stifle creative exploration.
What I also love about this story is how there’s no single hero. I understand the motivational power of the simple great-man science stories, like John Snow and his Broad Street Pump, or Alexander Fleming and his Petri dishes. But historians of science have been scolding us for decades that most of these scientific beatifications are far oversimplified. The muddled mess of vitamin discovery is more the rule than the exception. Though there were multiple Nobel Prizes, it’s really hard to extract a single hero.
On the other hand, it’s easy to find lots of petty fights. Babcock chided Atwater about his protein theories. McCollum was humiliated by Mendel’s work, which proved his palatability hypothesis erroneous. In his autobiography, he admits embarrassment and revenge were among his motivations for studying milk extracts. Harry Steenbock felt like he should have been on McCollum and Davis’ Vitamin A paper and held a grudge for years. He even wrote a letter to Science magazine in 1918, accusing McCollum of academic misconduct when he moved his lab from Wisconsin to Johns Hopkins.
The process of searching for vitamins was a mess. But we learned from the mess. And when we did, we found undeniable effects that completely transformed our understanding of food and our ability to treat disease.

