If you're looking for an etymological connection between bureaucracy and Statistics, the word "state" is right there in the name! IIUC, etymologically "statistic" originally meant "having to do with the state", with "statistics" becoming shorthand for "statistic data" or "facts relevant to the state". (Which depending on whom you ask is pretty much all of them, hence the semantic broadening I guess.)
Maybe I should write a post with a reading list on this topic... a lot has been written about using the visualization tools of statistics to create bureaucratic legibility of social processes. (e.g., Desrosières, Daston, Gigerenzer, Porter, Scott,...). I'm after something that happened later, which was using statistical tests and causal inference methods to *affect* social processes (through evaluation and regulation).
Great write up. If you haven't read it I can highly recommend Theodore Porter's "Trust in Numbers." It (among other things) compares the engineering statistics developed in France, where elites make all the decisions and thus generally just wanted an way to evaluate what to do, with engineering statistics developed in the US, where democratic (well, congressional) oversight meant that the engineers needed a way to make their assessments hard to question by politicians. I think you would enjoy it based on this post.
A classic! I'm wondering if I need to write an argmin reading list post/series about some of my favorites of this genre. I love Daston, Desrosières, Graeber, Gigerenzer, Porter, Scott. I also really like some of the economic critiques of contemporary economics, be they by Deaton, Leamer, Lukas, Kay, or King. Maybe I just need to make an annotated bibliography.
Thank you for taking the time to reply to my comments. Let me say that I agree with many of the points in your longer article about the goals of regulations: constraining individual biases and desires, ensuring error control, and promoting fairness, transparency, and responsibility.
I bring up two main problems I find in your recent reply. First,
(1) I’m not sure how you’ve addressed my point. I had written: “[You] claim ‘The ex ante frame obviates heated debates about inferential interpretations of probability and statistical tests, p-values, and rituals’. I don’t see how it sidesteps the debates about how methods serve the function of learning from data, and about which methods ought to constrain statistical inferences and subsequent policy.”
In places, you yourself seem to agree with my position:
“Framing statistical testing as policymaking leads to many questions for methodologists of statistics and causal inference. What statistical rules should we advocate for? Do current designs, whether they be panels, regression discontinuities, instrumental variables, or differences and differences, suffice for policymaking?”
But it’s hard to know how to evaluate your suggestion, a moment later, that these questions boil down to “what makes our methods most suited to being rules in an elaborate bureaucratic game?
(2) Second, while on the one hand, the conception of statistics as regulation is intended to provide it with a positive portrayal that is currently missing, many of your remarks suggest that statistics is disconnected from grounding the causal and other evidential claims that are required to warrant regulatory policies.
First, some positives that would accrue:
“Rulemaking is certainly not the singular valuable application of statistics, but it has been a revolutionary and uncelebrated use. ..Statistics sells itself short by not embracing this role.”
“The biggest success of the RCT has been in drug trials, providing evidentiary guidelines for pharmaceutical companies to demonstrate that their drugs are safe and effective. Randomized clinical trials have been more generally applied in medicine to establish guidelines for standard of care. More recently, technology companies have widely adopted RCTs to evaluate whether to deploy new software features.
But then:
“In all of these cases, causation is a secondary concern. The RCT primarily serves as a mechanism of regulation, regulating which drugs come to market, which clinical practices become standard of care, which software goes into production, and which academic papers prestige journals publish.” That is, one of the prime roles of the RCT is facilitating rulemaking.”
You give the impression that the ability of statistics to regulate is somehow distinct from its ability to supply the evidence required for good regulation. But the two are intimately connected.
As Fisher says, The purpose of randomisation . . . is to guarantee the validity of the test of significance, this test being based on an estimate of error made possible by
replication. (Fisher [1935, Design of Experiments, p. 26)
Perhaps you are merely describing the psychology of some researchers, driven by perverse incentives. Maybe you are describing what Gigerenzer calls statistical “ritual”—the thoughtless and fallacious application of statistical significance tests. But these would scarcely be to give a positive spin to statistics. Thus, I say it is a mistake for you to identify your “regulation” with his “ritual”, as you recommend at times. His "ritual" is intended to be pejorative. Your "regulation" need not be, if properly developed.
In many places you appear to agree as to the evidential, inferential, and epistemic value of statistical tests:
“Observing a higher correlation would be unlikely if the treatment truly had no effect. ..The value of such regulatory rules goes well beyond the associated statistical guarantees. As Bradford Hill repeatedly noted, randomization removes potential biases and confounders in the trial.
”Ex ante statistical guarantees additionally assure stakeholders that—at least most of the time—our study should produce a trustworthy answer. “
Providing a trustworthy answer most of the time is an epistemological value which, when satisfied, can serve for policy regulations.
“While statistics has admirable aspirations to help answer questions about ex post inference, it’s hard to find grand scientific discoveries solely enabled by RCTs or other causal inference methods. Scientific inference is rarely numerical and always cultural and heuristic.”
There’s no reason to suppose that a scientific method would have had to be necessary for a discovery in order to regard it as evidentially important.
The emphasis on RCTs as the "crown jewel" of statistics may unintentionally understate the contributions of the econometric approach, which was driven to develop sophisticated tools (IVs, Diff-in-diff, structural modeling, etc.) to extract causal insights from observational data. Econometric approaches are no less critical for rulemaking and governance. While RCTs offer strong ex-ante guarantees, econometrics often serve as the primary method for policy evaluation in real-world settings relying on robustness checks, sensitivity analyses, and validation of the DGP to justify the ex-poste validity of models.
I’m admittedly biased here (a function of being a somewhat lapsed practitioner with domain expertise in my field), but econometricians prioritize caution, rigor, and alignment with theory – and honest brokers often recommend no action if the evidence contradicts their priors or lacks robustness. Statisticians tend to prioritize adaptability and actionable insights and are often more accepting of higher levels of uncertainty to offer alternative paths forward. Mark Haynes of CNBC used to jokingly refer to certain forms of analysis as “The Church of What’s Working now” and some of that may apply here. The environment of our specific problem provided ample opportunities for extensive out-of-sample testing during the model development phase, allowing us to maintain a reasonable level of confidence in the rigor and robustness of the analysis.
To me, the bottom line is that striking the right balance between exploration and rigor is essential for meaningful and reliable insights in both analysis and policy.
You “introduce the concept of ‘ex ante policy,’ describing statistical rules and procedures designed before data collection to govern future actions” and claim “The ex ante frame obviates heated debates about inferential interpretations of probability and statistical tests, p-values, and rituals”. I don’t see how it sidesteps the debates about how methods serve the function of learning from data, and about which methods ought to constrain statistical inferences and subsequent policy.
My point is illustrated in the (2021)editorial of mine that you cite:
“While it is well known that stopping when the data look good inflates the type I error probability, a strict Bayesian is not required to adjust for interim checking because the posterior probability is unaltered. Advocates of Bayesian clinical trials are in a quandary because “The [regulatory] requirement of Type I error control for Bayesian [trials] causes them to lose many of their philosophical advantages, such as compliance with the likelihood principle” (Ryan et al., 2020: 7).” https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/cobi.13861
Thus, the foundational questions reappear in the form of which regulations are warranted for the goals at hand, and why. We still need to ask what warranted scientific inference requires.
The authors, researchers in radiation oncology, go on to say: If we “simply report the posterior probability of benefit, then we could potentially avoid having to specify the type I error of a Bayesian design”. If they strongly believe the effect, they claim, the need to control type I errors is of no interest to Bayesians. That is what led to the remark of mine you cite.
“It may be retorted that implausible inferences will indirectly be blocked by appropriate prior degrees of belief (informative priors), but this misses the crucial point. The key function of statistical tests is to constrain the human tendency to selectively favor views they believe in.” (Mayo 2021)
I see the role of clinical trials differently, especially for pharmaceuticals. The primary role of a drug trial is regulating safety. We can contort the question of safety into one of scientific validity, but we can also just view it as quality control. And once we view it as quality control, the role of statistics becomes one of simple decision making (we don't even need Neyman Pearson).
I can make the same argument about efficacy trials. Most of the work in the trial design goes into articulating the protocol, endpoint, and reporting structure. After that, the statistical test is a formality. Plenty of tests would suffice, and the associated errors don't matter much. They are an articulation of what the government demands from new products and are, again, a question of quality control, not theory testing.
And in fact, because of the structure of clinical trials, they inhibit our ability to learn from data. Overly rigid protocols with highly specified endpoints strangle our ability to understand the full potential of treatments in the context of creative care.
Have you read the full FDA statistical and clinical review packets for 5-10 controversial drugs that got discussed? Have you read how guidance documents have evolved over a 20 year period? It is extremely enlightening and is not at all well captured by what you articulate here.
The process is really about ensuring that clinicians and drug developers aren't fooling themselves and their patients that the purported evidence supports the marketing claim. They develop some pretty nuanced judgement calls and they do a lot of investigative/detective work to incorporate all prior empirical evidence. From what I have seen, they have a nose for making the right calls when they judge the evidence to be poor vs. adequate.
Nobody is really obsessed with the formality of statistical tests, those are minor in the grand scheme of things. They are the final guardrails, but not where the real action happens. As you say plenty of equivalent and good enough ways of doing this.
- They will call out flawed ways of handling missing data, especially if the results are sensitive to how this is dealt with.
- They will both entertain novel endpoints and delve into all the weaknesses in how they were chosen and what the alternatives might demonstrate.
- These sorts of issues are informed by sound statistical thinking but not really about p-value regulations
Operationally, the statistical testing components for clinical trial reporting serves as quality control. But the entire sequence of evidential rigor articulated in guidance documents written by the FDA do serve the goals of theory testing, of testing hypotheses held by drug hunters and pharmaceutical scientists. It is an important kind of feedback that makes those scientists go back and re-think. It probably helps lower-level scientists get the resources from executives that better science is actually necessary and not optional.
A big part of Biotech VC believes that Richard Pazdur, who heads the FDA oncology center of excellence, has played a large role in holding pharma companies to high scientific standards in evaluating cancer therapies. More so than other divisions. And the result is that we have seen a huge rise in more effective therapies over the last 20 years. Demanding better experimental design, measurements, statistical evidence, etc.. has been a big component of this. It is why, clinicians do better science in cancer drug evaluation than in other diseases. One could argue this has raised the scientific validity of a whole lot of cancer drug development and clinical research.
Ben: Let me try to place my reply in the correct place, sorry. Please erase the place it landed (as a new comment).
Thank you for taking the time to reply to my comments. Let me say that I agree with many of the points in your longer article about the goals of regulations: constraining individual biases and desires, ensuring error control, and promoting fairness, transparency, and responsibility.
I bring up two main problems I find in your recent reply. First,
(1) I’m not sure how you’ve addressed my point. I had written: “[You] claim ‘The ex ante frame obviates heated debates about inferential interpretations of probability and statistical tests, p-values, and rituals’. I don’t see how it sidesteps the debates about how methods serve the function of learning from data, and about which methods ought to constrain statistical inferences and subsequent policy.”
In places, you yourself seem to agree with my position:
“Framing statistical testing as policymaking leads to many questions for methodologists of statistics and causal inference. What statistical rules should we advocate for? Do current designs, whether they be panels, regression discontinuities, instrumental variables, or differences and differences, suffice for policymaking?”
But it’s hard to know how to evaluate your suggestion, a moment later, that these questions boil down to “what makes our methods most suited to being rules in an elaborate bureaucratic game?
(2) Second, while on the one hand, the conception of statistics as regulation is intended to provide it with a positive portrayal that is currently missing, many of your remarks suggest that statistics is disconnected from grounding the causal and other evidential claims that are required to warrant regulatory policies.
First, some positives that would accrue:
“Rulemaking is certainly not the singular valuable application of statistics, but it has been a revolutionary and uncelebrated use. ..Statistics sells itself short by not embracing this role.”
“The biggest success of the RCT has been in drug trials, providing evidentiary guidelines for pharmaceutical companies to demonstrate that their drugs are safe and effective. Randomized clinical trials have been more generally applied in medicine to establish guidelines for standard of care. More recently, technology companies have widely adopted RCTs to evaluate whether to deploy new software features.
But then:
“In all of these cases, causation is a secondary concern. The RCT primarily serves as a mechanism of regulation, regulating which drugs come to market, which clinical practices become standard of care, which software goes into production, and which academic papers prestige journals publish.” That is, one of the prime roles of the RCT is facilitating rulemaking.”
You give the impression that the ability of statistics to regulate is somehow distinct from its ability to supply the evidence required for good regulation. But the two are intimately connected.
As Fisher says, The purpose of randomisation . . . is to guarantee the validity of the test of significance, this test being based on an estimate of error made possible by
replication. (Fisher [1935, Design of Experiments, p. 26)
Perhaps you are merely describing the psychology of some researchers, driven by perverse incentives. Maybe you are describing what Gigerenzer calls statistical “ritual”—the thoughtless and fallacious application of statistical significance tests. But these would scarcely be to give a positive spin to statistics. Thus, I say it is a mistake for you to identify your “regulation” with his “ritual”, as you recommend at times. His "ritual" is intended to be pejorative. Your "regulation" need not be, if properly developed.
In many places you appear to agree as to the evidential, inferential, and epistemic value of statistical tests:
“Observing a higher correlation would be unlikely if the treatment truly had no effect. ..The value of such regulatory rules goes well beyond the associated statistical guarantees. As Bradford Hill repeatedly noted, randomization removes potential biases and confounders in the trial.
”Ex ante statistical guarantees additionally assure stakeholders that—at least most of the time—our study should produce a trustworthy answer. “
Providing a trustworthy answer most of the time is an epistemological value which, when satisfied, can serve for policy regulations.
“While statistics has admirable aspirations to help answer questions about ex post inference, it’s hard to find grand scientific discoveries solely enabled by RCTs or other causal inference methods. Scientific inference is rarely numerical and always cultural and heuristic.”
There’s no reason to suppose that a scientific method would have had to be necessary for a discovery in order to regard it as evidentially important.
"Partially, this is because I’m not a fan of policy in general. Policy is at best paternalistic and at worst authoritarian. We can’t “mechanism design” our way to utopia."
What does this actually mean in practice? Libertarianism?
There's a huge gap between out-of-touch, data-driven, paternalistic technocracy and libertarianism. The lesson of the Biden years is that the former has run its course. For what comes after that, left-leaning folks really need to do some soul searching.
If you want a steelman defense of bureaucracy, here's the last paragraph of my commentary:
"No one wants to be called a bureaucrat. It takes on a disparaging connotation, especially in academia, where besieged professors are downtrodden by exponentially growing paperwork. And yet, bureaucracies enable massive systems of governance to function. These systems do not always function well, but they operate at astounding scales. It’s hard to take pride in bureaucracies, but what if we embraced the admirable goal of creating well-run systems of participatory decision making at a global scale?"
If you're looking for an etymological connection between bureaucracy and Statistics, the word "state" is right there in the name! IIUC, etymologically "statistic" originally meant "having to do with the state", with "statistics" becoming shorthand for "statistic data" or "facts relevant to the state". (Which depending on whom you ask is pretty much all of them, hence the semantic broadening I guess.)
Exactly! It's right there in the name!
Maybe I should write a post with a reading list on this topic... a lot has been written about using the visualization tools of statistics to create bureaucratic legibility of social processes. (e.g., Desrosières, Daston, Gigerenzer, Porter, Scott,...). I'm after something that happened later, which was using statistical tests and causal inference methods to *affect* social processes (through evaluation and regulation).
Great write up. If you haven't read it I can highly recommend Theodore Porter's "Trust in Numbers." It (among other things) compares the engineering statistics developed in France, where elites make all the decisions and thus generally just wanted an way to evaluate what to do, with engineering statistics developed in the US, where democratic (well, congressional) oversight meant that the engineers needed a way to make their assessments hard to question by politicians. I think you would enjoy it based on this post.
A classic! I'm wondering if I need to write an argmin reading list post/series about some of my favorites of this genre. I love Daston, Desrosières, Graeber, Gigerenzer, Porter, Scott. I also really like some of the economic critiques of contemporary economics, be they by Deaton, Leamer, Lukas, Kay, or King. Maybe I just need to make an annotated bibliography.
Thank you for taking the time to reply to my comments. Let me say that I agree with many of the points in your longer article about the goals of regulations: constraining individual biases and desires, ensuring error control, and promoting fairness, transparency, and responsibility.
I bring up two main problems I find in your recent reply. First,
(1) I’m not sure how you’ve addressed my point. I had written: “[You] claim ‘The ex ante frame obviates heated debates about inferential interpretations of probability and statistical tests, p-values, and rituals’. I don’t see how it sidesteps the debates about how methods serve the function of learning from data, and about which methods ought to constrain statistical inferences and subsequent policy.”
In places, you yourself seem to agree with my position:
“Framing statistical testing as policymaking leads to many questions for methodologists of statistics and causal inference. What statistical rules should we advocate for? Do current designs, whether they be panels, regression discontinuities, instrumental variables, or differences and differences, suffice for policymaking?”
But it’s hard to know how to evaluate your suggestion, a moment later, that these questions boil down to “what makes our methods most suited to being rules in an elaborate bureaucratic game?
(2) Second, while on the one hand, the conception of statistics as regulation is intended to provide it with a positive portrayal that is currently missing, many of your remarks suggest that statistics is disconnected from grounding the causal and other evidential claims that are required to warrant regulatory policies.
First, some positives that would accrue:
“Rulemaking is certainly not the singular valuable application of statistics, but it has been a revolutionary and uncelebrated use. ..Statistics sells itself short by not embracing this role.”
“The biggest success of the RCT has been in drug trials, providing evidentiary guidelines for pharmaceutical companies to demonstrate that their drugs are safe and effective. Randomized clinical trials have been more generally applied in medicine to establish guidelines for standard of care. More recently, technology companies have widely adopted RCTs to evaluate whether to deploy new software features.
But then:
“In all of these cases, causation is a secondary concern. The RCT primarily serves as a mechanism of regulation, regulating which drugs come to market, which clinical practices become standard of care, which software goes into production, and which academic papers prestige journals publish.” That is, one of the prime roles of the RCT is facilitating rulemaking.”
You give the impression that the ability of statistics to regulate is somehow distinct from its ability to supply the evidence required for good regulation. But the two are intimately connected.
As Fisher says, The purpose of randomisation . . . is to guarantee the validity of the test of significance, this test being based on an estimate of error made possible by
replication. (Fisher [1935, Design of Experiments, p. 26)
Perhaps you are merely describing the psychology of some researchers, driven by perverse incentives. Maybe you are describing what Gigerenzer calls statistical “ritual”—the thoughtless and fallacious application of statistical significance tests. But these would scarcely be to give a positive spin to statistics. Thus, I say it is a mistake for you to identify your “regulation” with his “ritual”, as you recommend at times. His "ritual" is intended to be pejorative. Your "regulation" need not be, if properly developed.
In many places you appear to agree as to the evidential, inferential, and epistemic value of statistical tests:
“Observing a higher correlation would be unlikely if the treatment truly had no effect. ..The value of such regulatory rules goes well beyond the associated statistical guarantees. As Bradford Hill repeatedly noted, randomization removes potential biases and confounders in the trial.
”Ex ante statistical guarantees additionally assure stakeholders that—at least most of the time—our study should produce a trustworthy answer. “
Providing a trustworthy answer most of the time is an epistemological value which, when satisfied, can serve for policy regulations.
“While statistics has admirable aspirations to help answer questions about ex post inference, it’s hard to find grand scientific discoveries solely enabled by RCTs or other causal inference methods. Scientific inference is rarely numerical and always cultural and heuristic.”
There’s no reason to suppose that a scientific method would have had to be necessary for a discovery in order to regard it as evidentially important.
This comment relates to earlier blogposts growing out of my Berkeley Neyman Seminar. Notably, my blogpost on you: https://errorstatistics.com/2024/10/22/response-to-ben-rechts-post-what-is-statistics-purpose-on-my-neyman-seminar/ and one on Philip Stark and the corruption of statistics: https://errorstatistics.com/2024/11/06/has-statistics-become-corrupted-philip-starks-questions-and-some-questions-about-them/)
The emphasis on RCTs as the "crown jewel" of statistics may unintentionally understate the contributions of the econometric approach, which was driven to develop sophisticated tools (IVs, Diff-in-diff, structural modeling, etc.) to extract causal insights from observational data. Econometric approaches are no less critical for rulemaking and governance. While RCTs offer strong ex-ante guarantees, econometrics often serve as the primary method for policy evaluation in real-world settings relying on robustness checks, sensitivity analyses, and validation of the DGP to justify the ex-poste validity of models.
I’m admittedly biased here (a function of being a somewhat lapsed practitioner with domain expertise in my field), but econometricians prioritize caution, rigor, and alignment with theory – and honest brokers often recommend no action if the evidence contradicts their priors or lacks robustness. Statisticians tend to prioritize adaptability and actionable insights and are often more accepting of higher levels of uncertainty to offer alternative paths forward. Mark Haynes of CNBC used to jokingly refer to certain forms of analysis as “The Church of What’s Working now” and some of that may apply here. The environment of our specific problem provided ample opportunities for extensive out-of-sample testing during the model development phase, allowing us to maintain a reasonable level of confidence in the rigor and robustness of the analysis.
To me, the bottom line is that striking the right balance between exploration and rigor is essential for meaningful and reliable insights in both analysis and policy.
You “introduce the concept of ‘ex ante policy,’ describing statistical rules and procedures designed before data collection to govern future actions” and claim “The ex ante frame obviates heated debates about inferential interpretations of probability and statistical tests, p-values, and rituals”. I don’t see how it sidesteps the debates about how methods serve the function of learning from data, and about which methods ought to constrain statistical inferences and subsequent policy.
My point is illustrated in the (2021)editorial of mine that you cite:
“While it is well known that stopping when the data look good inflates the type I error probability, a strict Bayesian is not required to adjust for interim checking because the posterior probability is unaltered. Advocates of Bayesian clinical trials are in a quandary because “The [regulatory] requirement of Type I error control for Bayesian [trials] causes them to lose many of their philosophical advantages, such as compliance with the likelihood principle” (Ryan et al., 2020: 7).” https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/cobi.13861
Thus, the foundational questions reappear in the form of which regulations are warranted for the goals at hand, and why. We still need to ask what warranted scientific inference requires.
The authors, researchers in radiation oncology, go on to say: If we “simply report the posterior probability of benefit, then we could potentially avoid having to specify the type I error of a Bayesian design”. If they strongly believe the effect, they claim, the need to control type I errors is of no interest to Bayesians. That is what led to the remark of mine you cite.
“It may be retorted that implausible inferences will indirectly be blocked by appropriate prior degrees of belief (informative priors), but this misses the crucial point. The key function of statistical tests is to constrain the human tendency to selectively favor views they believe in.” (Mayo 2021)
My remark was intended in this context, where the constraint had to do with Type I errors control. (It would have been too vague on its own.) A discussion of Ryan et al., is in my blogpost, “Should Bayesian clinical trialists wear error statistical hats?” https://errorstatistics.com/2021/08/21/should-bayesian-clinical-trialists-wear-error-statistical-hats/
I see the role of clinical trials differently, especially for pharmaceuticals. The primary role of a drug trial is regulating safety. We can contort the question of safety into one of scientific validity, but we can also just view it as quality control. And once we view it as quality control, the role of statistics becomes one of simple decision making (we don't even need Neyman Pearson).
I can make the same argument about efficacy trials. Most of the work in the trial design goes into articulating the protocol, endpoint, and reporting structure. After that, the statistical test is a formality. Plenty of tests would suffice, and the associated errors don't matter much. They are an articulation of what the government demands from new products and are, again, a question of quality control, not theory testing.
And in fact, because of the structure of clinical trials, they inhibit our ability to learn from data. Overly rigid protocols with highly specified endpoints strangle our ability to understand the full potential of treatments in the context of creative care.
Have you read the full FDA statistical and clinical review packets for 5-10 controversial drugs that got discussed? Have you read how guidance documents have evolved over a 20 year period? It is extremely enlightening and is not at all well captured by what you articulate here.
The process is really about ensuring that clinicians and drug developers aren't fooling themselves and their patients that the purported evidence supports the marketing claim. They develop some pretty nuanced judgement calls and they do a lot of investigative/detective work to incorporate all prior empirical evidence. From what I have seen, they have a nose for making the right calls when they judge the evidence to be poor vs. adequate.
Nobody is really obsessed with the formality of statistical tests, those are minor in the grand scheme of things. They are the final guardrails, but not where the real action happens. As you say plenty of equivalent and good enough ways of doing this.
- They will call out flawed ways of handling missing data, especially if the results are sensitive to how this is dealt with.
- They will both entertain novel endpoints and delve into all the weaknesses in how they were chosen and what the alternatives might demonstrate.
- These sorts of issues are informed by sound statistical thinking but not really about p-value regulations
Operationally, the statistical testing components for clinical trial reporting serves as quality control. But the entire sequence of evidential rigor articulated in guidance documents written by the FDA do serve the goals of theory testing, of testing hypotheses held by drug hunters and pharmaceutical scientists. It is an important kind of feedback that makes those scientists go back and re-think. It probably helps lower-level scientists get the resources from executives that better science is actually necessary and not optional.
A big part of Biotech VC believes that Richard Pazdur, who heads the FDA oncology center of excellence, has played a large role in holding pharma companies to high scientific standards in evaluating cancer therapies. More so than other divisions. And the result is that we have seen a huge rise in more effective therapies over the last 20 years. Demanding better experimental design, measurements, statistical evidence, etc.. has been a big component of this. It is why, clinicians do better science in cancer drug evaluation than in other diseases. One could argue this has raised the scientific validity of a whole lot of cancer drug development and clinical research.
Ben: Let me try to place my reply in the correct place, sorry. Please erase the place it landed (as a new comment).
Thank you for taking the time to reply to my comments. Let me say that I agree with many of the points in your longer article about the goals of regulations: constraining individual biases and desires, ensuring error control, and promoting fairness, transparency, and responsibility.
I bring up two main problems I find in your recent reply. First,
(1) I’m not sure how you’ve addressed my point. I had written: “[You] claim ‘The ex ante frame obviates heated debates about inferential interpretations of probability and statistical tests, p-values, and rituals’. I don’t see how it sidesteps the debates about how methods serve the function of learning from data, and about which methods ought to constrain statistical inferences and subsequent policy.”
In places, you yourself seem to agree with my position:
“Framing statistical testing as policymaking leads to many questions for methodologists of statistics and causal inference. What statistical rules should we advocate for? Do current designs, whether they be panels, regression discontinuities, instrumental variables, or differences and differences, suffice for policymaking?”
But it’s hard to know how to evaluate your suggestion, a moment later, that these questions boil down to “what makes our methods most suited to being rules in an elaborate bureaucratic game?
(2) Second, while on the one hand, the conception of statistics as regulation is intended to provide it with a positive portrayal that is currently missing, many of your remarks suggest that statistics is disconnected from grounding the causal and other evidential claims that are required to warrant regulatory policies.
First, some positives that would accrue:
“Rulemaking is certainly not the singular valuable application of statistics, but it has been a revolutionary and uncelebrated use. ..Statistics sells itself short by not embracing this role.”
“The biggest success of the RCT has been in drug trials, providing evidentiary guidelines for pharmaceutical companies to demonstrate that their drugs are safe and effective. Randomized clinical trials have been more generally applied in medicine to establish guidelines for standard of care. More recently, technology companies have widely adopted RCTs to evaluate whether to deploy new software features.
But then:
“In all of these cases, causation is a secondary concern. The RCT primarily serves as a mechanism of regulation, regulating which drugs come to market, which clinical practices become standard of care, which software goes into production, and which academic papers prestige journals publish.” That is, one of the prime roles of the RCT is facilitating rulemaking.”
You give the impression that the ability of statistics to regulate is somehow distinct from its ability to supply the evidence required for good regulation. But the two are intimately connected.
As Fisher says, The purpose of randomisation . . . is to guarantee the validity of the test of significance, this test being based on an estimate of error made possible by
replication. (Fisher [1935, Design of Experiments, p. 26)
Perhaps you are merely describing the psychology of some researchers, driven by perverse incentives. Maybe you are describing what Gigerenzer calls statistical “ritual”—the thoughtless and fallacious application of statistical significance tests. But these would scarcely be to give a positive spin to statistics. Thus, I say it is a mistake for you to identify your “regulation” with his “ritual”, as you recommend at times. His "ritual" is intended to be pejorative. Your "regulation" need not be, if properly developed.
In many places you appear to agree as to the evidential, inferential, and epistemic value of statistical tests:
“Observing a higher correlation would be unlikely if the treatment truly had no effect. ..The value of such regulatory rules goes well beyond the associated statistical guarantees. As Bradford Hill repeatedly noted, randomization removes potential biases and confounders in the trial.
”Ex ante statistical guarantees additionally assure stakeholders that—at least most of the time—our study should produce a trustworthy answer. “
Providing a trustworthy answer most of the time is an epistemological value which, when satisfied, can serve for policy regulations.
“While statistics has admirable aspirations to help answer questions about ex post inference, it’s hard to find grand scientific discoveries solely enabled by RCTs or other causal inference methods. Scientific inference is rarely numerical and always cultural and heuristic.”
There’s no reason to suppose that a scientific method would have had to be necessary for a discovery in order to regard it as evidentially important.
This comment relates to earlier blogposts growing out of my Berkeley Neyman Seminar. Notably, my blogpost on you: https://errorstatistics.com/2024/10/22/response-to-ben-rechts-post-what-is-statistics-purpose-on-my-neyman-seminar/ and one on Philip Stark and the corruption of statistics: https://errorstatistics.com/2024/11/06/has-statistics-become-corrupted-philip-starks-questions-and-some-questions-about-them/)
"Partially, this is because I’m not a fan of policy in general. Policy is at best paternalistic and at worst authoritarian. We can’t “mechanism design” our way to utopia."
What does this actually mean in practice? Libertarianism?
There's a huge gap between out-of-touch, data-driven, paternalistic technocracy and libertarianism. The lesson of the Biden years is that the former has run its course. For what comes after that, left-leaning folks really need to do some soul searching.
If you want a steelman defense of bureaucracy, here's the last paragraph of my commentary:
"No one wants to be called a bureaucrat. It takes on a disparaging connotation, especially in academia, where besieged professors are downtrodden by exponentially growing paperwork. And yet, bureaucracies enable massive systems of governance to function. These systems do not always function well, but they operate at astounding scales. It’s hard to take pride in bureaucracies, but what if we embraced the admirable goal of creating well-run systems of participatory decision making at a global scale?"