All published results are wrong, but some are useful

But all observational studies are wrong.

Nov 02, 2023

Though I could teach a semester course on methods for observational causal inference, a lecture is more than enough. I was mostly blissfully aware of these methods until five years ago. I got more interested in understanding them when working on a collaboration in remote sensing. I learned from my students and collaborators about what social scientists might accept as “causal” explanations from data-driven methods. For instance, Josh Blumenstock told me there were only four methods anyone needed to know as these were the only ones accepted by the holy council of econometrics: panel analysis, difference in differences, regression discontinuity, instrumental variables. The causal canon was amusing but also baffling. Did “controlling” for a variable really just mean adding it to a regression analysis? No way…

It stopped being amusing during the pandemic when these shitty observational designs were used as justification for terrible policies. Governments and institutions argued for school closures, isolation policies, and infinite, indiscriminate boosting based on these insane methods that had never found anything interesting. If a study could get released with tortured data that agreed with someone’s party line, then it would be held up as “The Science.” We were all told that we had to follow it.

But science is a fucking mess. Scientists have no idea what they’re doing. And observational studies in science are the bottom of the barrel. I became obsessed with understanding when we decided that finding random data under some rock and pretending we did an experiment was acceptable methodology. We all know that correlation doesn’t imply causation, but if you write enough robustness checks, apparently, it does? No, it doesn’t.

You’d think for the ubiquity of these statistical techniques, there would be some home runs out there proving their worth. What is the track record of observational causal methods? What are the greatest hits of observational studies? People always point to Doll and Hill’s British Doctors Study, which provided evidence that smoking caused cancer. This seems to be the exception that proves the rule. More commonly, we run into situations like the Women’s Health Initiative, which studied the benefits of hormone replacement therapy in post-menopausal women. The WHI ran a careful clinical trial, finding the risks outweigh the benefits and contradicting the conclusions of observational studies.

Another example commonly touted as a success story is the instrumental variables analysis used by Angrist and Lavy to show that smaller classes improved student performance. But this work was just confirming what had already been demonstrated by a randomized trial in Tenessee.

I’m sure there are readers out there who could provide more favorable examples in the comments. I’m willing to consider the evidence. But I think I’ve been after this for long enough to know I haven’t missed many big ones.

And I’m not a lone crank arguing about observational regression studies. Many amazing critiques have come before me. I’d highly recommend reading David Altman, Lorraine Daston, Angus Deaton, Gerd Gigerenzer, Edward Leamer, Paul Meehl, and Theodore Porter. Perhaps I should make a syllabus. I have a particular soft spot for David Freedman, a Berkeley statistician who passed away a few years before I arrived on campus. Freedman was making the same arguments I’m making in the 1990s, and his criticism from then has never been addressed. Instead, it has been ignored. We keep deriving more methods and publishing more observational studies.

So what’s the solution? Should we move to more randomized trials? I’m sympathetic to this viewpoint, but it is only part of the solution. Most randomized trials are also wrong. It’s not hard to do a shitty experiment. Choosing some random intervention and trying it and then torturing the data until p is less than 0.05 doesn’t necessarily improve our situation. Randomized trials themselves are susceptible to all sorts of biases.

My take, and it will piss off the scientists, is that the answer is greater acceptance and appreciation of engineering. What do I mean by engineering? At some point, we have to do things rather than studying them in isolation. Engineering is this doing. It tends to be more goal-directed, and it might cut corners to hit these goals. But careful engineering can prove a theory true. Sometimes, we find out what’s true by finding out what works.

One of my favorite, though dark, examples pitting science against engineering comes from the theory of relativity. We’re taught in physics classes that a definitive experiment along the way was the Michaelson Morely experiment. However, revisiting the evidence shows that their complicated interferometer might not have ever been working. There are two plausible explanations for the outcome of their experiment: there is no ether through which light travels or their experimental apparatus was broken. But in the end, it doesn’t matter if their experiment worked. The atom bomb confirmed the theory of relativity to anyone who doubted.

In the human-facing sciences, we seldom have such, um, explosive evidence. But in medicine we do commonly find interventions with undeniable efficacy: hand washing, vitamins, antibiotics, certain cancer treatments, covid vaccines, etc. These are among countless interventions where the p-value is zero. And we figured out that they worked by engineering medical systems around these interventions.

Scientists love to be dismissive and poo-poo some endeavors as “just engineering.” What if engineering is actually what we’re after? What if this science stuff is just how we spin our wheels on the way? Experimentation, titration, and hypothesis invalidation are as crucial to engineers as to scientists. But it’s through the implementation itself that we figure out what works.

Ryan

Nov 7, 2023

Preach!

Expand full comment

Sarah Dean

Nov 3, 2023

To your point about potential flaws in randomized studies, the ultimate takeaway for hormone replacement therapy is not so easily inferred from the results of WHI's randomized control trial. A recent NYT piece argues against the now common wisdom that the costs of treatment outweigh the benefits -- both because the scientifically measured "benefits" are narrow in scope compared with women's experiences, and because the focus on /post/ menopausal women misses a key demographic:

> The study itself was designed with what would come to be seen as a major flaw. W.H.I. researchers wanted to be able to measure health outcomes — how many women ended up having strokes, heart attacks or cancer — but those ailments may not show up until women are in their 70s or 80s. The study was scheduled to run for only 8½ years. So they weighted the participants toward women who were already 60 or older. That choice meant that women in their 50s, who tended to be healthier and have more menopausal symptoms, were underrepresented in the study.

https://www.nytimes.com/2023/02/01/magazine/menopause-hot-flashes-hormone-therapy.html

1 reply by Ben Recht

12 more comments...

arg min

Discussion about this post