Isn't the right question which false discoveries were prevented through the application of statistics? That is, the purpose of statistics is to prevent (or at least reduce to say, 5%) the publication of false papers. Looking at published papers to assess this is not very informative. Even so, in my experience, because standard statistical practice fails to capture many sources of variation, it fails at much higher than the 5% rate.
More generally, as you have emphasized multiple times, passing a NHST only tells us there might be a non-random signal present. It doesn't tell us what the signal implies about our beliefs
I don't think statistics is merely about some kind of false discovery control, it is also about rescuing weak signals in intelligent and efficient ways.
Statistics has been at the heart of figuring out how to develop a reliable procedure for learning from horribly noisy measurements in both directions. The entire saga of going from candidate gene studies to GWAS, however inefficient, is a pretty good positive example of how good statistical thinking won that field.
With the requisite caveat that no-one should read my comment here as condoning the ubiquitous and dreadful misunderstandings about heritability.
If you were to read Box's Improving Almost Anything for instance, it has very little about making particular discoveries and almost entirely about improving a complex system by continually designing experiments and learning from them.
The problem with NHST was that it made scientists think statistical rituals would eliminate having to work hard to reduce the gap between scientific question of interest and some easy / null hypothesis that is convenient to test. It turns out science is rarely organized well to bridge that gap. For instance, a lot of network scientists thought it was meaningful to show that biological networks are more small world than random networks. But they never thought to ask whether Erdos-Renyi random graphs are a relevant null distribution for biological networks ultimately constructed from high dimensional covariance matrices.
- We have an "object" that has been "observed" at exactly one location on earth.
- This observation was done on in an experiment where no single person understand the entirety of the procedure.
- It requires over 6 years of graduate study to fully understand what was supposed to be seen in the most ideal experimental situation.
- Under these idealized conditions which no one fully understands, the CERN folks tell us that the p-value of seeing what they saw if the Higgs wasn't there is less than 0.0000006.
So this leaves me with a couple of questions: What does it mean that the Higgs Boson was discovered? And why would we be in a different situation without that p-value?
If you believe Bob Cousins of CMS then neither ATLAS nor them would have claimed discovery, had they not reached 5 sigma independently - https://youtu.be/Vdib55prOJs?t=4107.
If opacity poses a problem for the LHC, many other discoveries are threatened by the same line of argument. I do agree though that "statistical discoveries" lack some "spraying for existence" quality. But I'm not sure if that means we all should become pragmatists.
Yes, 100%. I'm not advocating for all of us to become pragmatists. But I also think that pragmatism too often gets a short shrift in the history and philosophy of science. Philosophy of engineering is underexplored!
And I do think fundamental particle physics gets a weird pass in the sciences because it is riding on the prestige of pre war physics and the atomic bomb. But the more I engage with the particularities of what they are trying to convince us all of, the more problems I see with their overall narrative.
For anyone interested in learning more about Stafford Beer's approach to cybernetics and his role in the Allende government's socialist experiments in Chile, I'd heartily recommend Cybernetic Revolutionaries by Eden Medina: https://mitpress.mit.edu/9780262525961/cybernetic-revolutionaries/
I've noticed that very few deep learning papers bother doing any sort of statistical testing. Moreover, they rarely even report confidence intervals. What do you think is the role of statistical testing for the development of novel deep learning methods? For example, say you're trying to design a new optimizer or architecture?
Isn't the right question which false discoveries were prevented through the application of statistics? That is, the purpose of statistics is to prevent (or at least reduce to say, 5%) the publication of false papers. Looking at published papers to assess this is not very informative. Even so, in my experience, because standard statistical practice fails to capture many sources of variation, it fails at much higher than the 5% rate.
More generally, as you have emphasized multiple times, passing a NHST only tells us there might be a non-random signal present. It doesn't tell us what the signal implies about our beliefs
I would certainly agree that this is a good question.
I don't think statistics is merely about some kind of false discovery control, it is also about rescuing weak signals in intelligent and efficient ways.
Statistics has been at the heart of figuring out how to develop a reliable procedure for learning from horribly noisy measurements in both directions. The entire saga of going from candidate gene studies to GWAS, however inefficient, is a pretty good positive example of how good statistical thinking won that field.
With the requisite caveat that no-one should read my comment here as condoning the ubiquitous and dreadful misunderstandings about heritability.
If you were to read Box's Improving Almost Anything for instance, it has very little about making particular discoveries and almost entirely about improving a complex system by continually designing experiments and learning from them.
The problem with NHST was that it made scientists think statistical rituals would eliminate having to work hard to reduce the gap between scientific question of interest and some easy / null hypothesis that is convenient to test. It turns out science is rarely organized well to bridge that gap. For instance, a lot of network scientists thought it was meaningful to show that biological networks are more small world than random networks. But they never thought to ask whether Erdos-Renyi random graphs are a relevant null distribution for biological networks ultimately constructed from high dimensional covariance matrices.
Ben:
I've written a response on my blog errorstatistics.com:
https://errorstatistics.com/2024/10/22/response-to-ben-rechts-post-what-is-statistics-purpose-on-my-neyman-seminar/
Thanks for the very thoughtful and ever thought provoking response, Deborah. I've replied with a follow up here: https://www.argmin.net/p/a-use-theory-of-testing
"What are the grand discoveries that we wouldn’t have made without an understanding of null hypothesis testing?"
Ok, I'll take the bait. Why wouldn't the discovery of the Higgs Boson count?
The Higgs is a fun example.
- We have an "object" that has been "observed" at exactly one location on earth.
- This observation was done on in an experiment where no single person understand the entirety of the procedure.
- It requires over 6 years of graduate study to fully understand what was supposed to be seen in the most ideal experimental situation.
- Under these idealized conditions which no one fully understands, the CERN folks tell us that the p-value of seeing what they saw if the Higgs wasn't there is less than 0.0000006.
So this leaves me with a couple of questions: What does it mean that the Higgs Boson was discovered? And why would we be in a different situation without that p-value?
If you believe Bob Cousins of CMS then neither ATLAS nor them would have claimed discovery, had they not reached 5 sigma independently - https://youtu.be/Vdib55prOJs?t=4107.
If opacity poses a problem for the LHC, many other discoveries are threatened by the same line of argument. I do agree though that "statistical discoveries" lack some "spraying for existence" quality. But I'm not sure if that means we all should become pragmatists.
Yes, 100%. I'm not advocating for all of us to become pragmatists. But I also think that pragmatism too often gets a short shrift in the history and philosophy of science. Philosophy of engineering is underexplored!
And I do think fundamental particle physics gets a weird pass in the sciences because it is riding on the prestige of pre war physics and the atomic bomb. But the more I engage with the particularities of what they are trying to convince us all of, the more problems I see with their overall narrative.
For anyone interested in learning more about Stafford Beer's approach to cybernetics and his role in the Allende government's socialist experiments in Chile, I'd heartily recommend Cybernetic Revolutionaries by Eden Medina: https://mitpress.mit.edu/9780262525961/cybernetic-revolutionaries/
I've noticed that very few deep learning papers bother doing any sort of statistical testing. Moreover, they rarely even report confidence intervals. What do you think is the role of statistical testing for the development of novel deep learning methods? For example, say you're trying to design a new optimizer or architecture?