Aren't you arguing that the "perfect" should undermine the "good"?
I am currently reading Judea Pearl's "The Book of Why". The case of tobacco and lung cancer also addresses the issue of not being able to do a RCT on tobacco use. Yet they do a historical analysis of cancer sufferers and the history of tobacco use that shows the relationship.
Pearl makes a very good case for causality. AFAICS, the difficulty is the generalized building of models for all cases, rather than a few for certain specific cases of study. It is the bespoke expert system model vs ML of rules-based systems like decision trees, that can be applied to any tabular data with known outcomes for each event.
Shouldn't the mantra be: "Imperfect accuracy [of the model] is better than being precisely wrong"?
There's a variant of Godwin's Law in causal inference: "As an online discussion of causal inference grows longer, the probability of someone bringing up smoking causing cancer approaches one."
LOL! Very good. Have you sent that to Pearl for a future text revision? ;-)
However, we do want to understand the how of phenomena, which means mechanisms of action, which means causality. When I was involved in bioinformatics and drug actions on physiology, we did use known metabolic and signaling pathways to understand when a drug acted on a target, and what its effects were on the gene expression in that pathway, and coupled that with knowledge of clinical pathologies. This went beyond statistical associations. Pharma companies wanted to understand the how of a new drug candidate, not just statistical analysis of the data.
When Chris Anderson argued that "Big data" would supplant most science, he was simply arguing that the statistical properties of variables would expose mechanisms. This might work in some instances, but not generally. AI using the LLM model is making similar claims for that shiny, abundant future, but it will run into the same problems, unless it can extract causal mechanisms.
We do see one pathology due to statistics, especially in medical papers, where p-hacking is prevalent, and papers are written around the "significant" p-values that have been mined from the experimental data. Statistics are a very useful tool, but they need a causal hypothesis to make sense of the analysis, and in particular, to then ask the next question to test.
TL; DR; I am a huge fan of ML and sequential design under performative settings. We need new instances of it more than ever in drug development and neurotech. It is most certainly not taken into account in evaluating any brain stimulation or neuromodulation trials.
> Analyzing a single step in a simple RCT reveals a surprising well of complexity and many headaches for the policymaker. It’s much easier to build up a framework for approving interventions than to imagine what will happen if those interventions are applied at a population scale.
This might be very much on the implementation side but well before Pearl developed transportability, people started to think about other kinds of non-ideal experiments that have greater external validity than RCTs. The high internal validity RCT is there to establish whether a benefit even exists, while pragmatic trials measure the real world performance.
We could have a philosophical discussion of why one might ever want to care about internal validity at all. Do ideal well controlled experiments matter when what one really cares about is real world performance? Nancy Cartwright's account of causality emphasizes the modularity of the real world. Some things just go together in nature and it doesn't make sense to ask what the causal effect of intervening on them separately even means. For therapeutic development, it makes sense to triangulate evidence across different research designs each of which trade off internal and external validity to cover some threat to the validity of the final scientific question.
> Fatalism assumes the absence of temporal dynamics. The meaning of treatments can’t change over time. It means your policy has no effect beyond the treatment of each unit in isolation. People will behave the same before you make a policy and after you make a policy. Most people who work on causal inference know none of this is true, of course. And any seasoned machine learning engineer knows this as well when maintaining systems to continually retrain their stable of prediction models.
In vaccine research, obesity, social network interventions, the violation of interference between units is very much a concern. Still generally an underrated topic https://arxiv.org/pdf/1403.1239
Aren't you arguing that the "perfect" should undermine the "good"?
I am currently reading Judea Pearl's "The Book of Why". The case of tobacco and lung cancer also addresses the issue of not being able to do a RCT on tobacco use. Yet they do a historical analysis of cancer sufferers and the history of tobacco use that shows the relationship.
Pearl makes a very good case for causality. AFAICS, the difficulty is the generalized building of models for all cases, rather than a few for certain specific cases of study. It is the bespoke expert system model vs ML of rules-based systems like decision trees, that can be applied to any tabular data with known outcomes for each event.
Shouldn't the mantra be: "Imperfect accuracy [of the model] is better than being precisely wrong"?
There's a variant of Godwin's Law in causal inference: "As an online discussion of causal inference grows longer, the probability of someone bringing up smoking causing cancer approaches one."
LOL! Very good. Have you sent that to Pearl for a future text revision? ;-)
However, we do want to understand the how of phenomena, which means mechanisms of action, which means causality. When I was involved in bioinformatics and drug actions on physiology, we did use known metabolic and signaling pathways to understand when a drug acted on a target, and what its effects were on the gene expression in that pathway, and coupled that with knowledge of clinical pathologies. This went beyond statistical associations. Pharma companies wanted to understand the how of a new drug candidate, not just statistical analysis of the data.
When Chris Anderson argued that "Big data" would supplant most science, he was simply arguing that the statistical properties of variables would expose mechanisms. This might work in some instances, but not generally. AI using the LLM model is making similar claims for that shiny, abundant future, but it will run into the same problems, unless it can extract causal mechanisms.
We do see one pathology due to statistics, especially in medical papers, where p-hacking is prevalent, and papers are written around the "significant" p-values that have been mined from the experimental data. Statistics are a very useful tool, but they need a causal hypothesis to make sense of the analysis, and in particular, to then ask the next question to test.
TL; DR; I am a huge fan of ML and sequential design under performative settings. We need new instances of it more than ever in drug development and neurotech. It is most certainly not taken into account in evaluating any brain stimulation or neuromodulation trials.
> Analyzing a single step in a simple RCT reveals a surprising well of complexity and many headaches for the policymaker. It’s much easier to build up a framework for approving interventions than to imagine what will happen if those interventions are applied at a population scale.
This might be very much on the implementation side but well before Pearl developed transportability, people started to think about other kinds of non-ideal experiments that have greater external validity than RCTs. The high internal validity RCT is there to establish whether a benefit even exists, while pragmatic trials measure the real world performance.
https://rethinkingclinicaltrials.org/chapters/design/experimental-designs-and-randomization-schemes/experimental-designs-introduction/
We could have a philosophical discussion of why one might ever want to care about internal validity at all. Do ideal well controlled experiments matter when what one really cares about is real world performance? Nancy Cartwright's account of causality emphasizes the modularity of the real world. Some things just go together in nature and it doesn't make sense to ask what the causal effect of intervening on them separately even means. For therapeutic development, it makes sense to triangulate evidence across different research designs each of which trade off internal and external validity to cover some threat to the validity of the final scientific question.
> Fatalism assumes the absence of temporal dynamics. The meaning of treatments can’t change over time. It means your policy has no effect beyond the treatment of each unit in isolation. People will behave the same before you make a policy and after you make a policy. Most people who work on causal inference know none of this is true, of course. And any seasoned machine learning engineer knows this as well when maintaining systems to continually retrain their stable of prediction models.
In vaccine research, obesity, social network interventions, the violation of interference between units is very much a concern. Still generally an underrated topic https://arxiv.org/pdf/1403.1239