I totally agree that these simple scorecards have a lot of richness re: data-driven decision-making in the wild. (I'm partial to nomograms myself https://en.wikipedia.org/wiki/Nomogram, visual calculators). There's so much richness in understanding how these seemingly pragmatic issues are actually first-order important in real-world decision-making.
As a different on-the-ground example, the Sepsis Watch project and deployment is a huge implementation effort involving organizational change and a (pre-post) clinical trial. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391165/
It's still not clear how to appropriately evaluate these kinds of operational interventions -- I'd love to learn more about regulatory discussions and other examples on these fronts -- but I think there's interest converging from multiple points of view (stat/ml, biostats, HCI/AI) given the importance of these real-world systems.
Ben: This is an incredibly interesting post. Thanks. Even as it seems scary, I suppose the medical establishment needs some transparent (?) mechanism to make decisions. In the HEART setting, would it help if that functional was clearly available or if they released their code and model? In lieu of an NN. 🧐 look forward to more such posts.
It really is just adding up 5 numbers in a qualitative questionnaire. The interesting one is the EKG read. There you need a clinician skilled in reading EKGs. But otherwise, everything else is open code and model!
I’m looking forward to your reflections here. I just started reading Nancy Cartwright through an indirect recommendation that is ultimately from you, and so I’m naturally wondering how she would think about all of these small, hard-to-evaluate heuristics. But I’m still in the middle of the Introduction to the Dappled World so I don’t know what she’s talking about when she calls for “what can be accomplished if improvements are guided by what is possible not what is longed for.”
I’m also left wondering what I should be comparing our current patchwork medical ML rules to—should it be to an ideal doctor, or to the median one?
Interesting post. I thought the most "AI-infused" real-world contexts at present were serving of ads and credit scoring, not healthcare (yet)? Both of these are opaque. You seem concerned about evaluation. In these other contexts the evaluation is never made public, I think, but still (because?) it is huge business. Healthcare probably ends up in the same bucket (it is already there really). Evaluation seems surprisingly irrelevant to rules and recommendations in healthcare and elsewhere.
Relevant to the theme: https://first10em.com/clinical-decision-rules/
Of course I lack the medical knowledge to say whether or not it's accurate, fair, etc.
I totally agree that these simple scorecards have a lot of richness re: data-driven decision-making in the wild. (I'm partial to nomograms myself https://en.wikipedia.org/wiki/Nomogram, visual calculators). There's so much richness in understanding how these seemingly pragmatic issues are actually first-order important in real-world decision-making.
Just as one example, recent work from Liu-Shahn-Robins-Rotnitzky specifically model the statistical restrictions imposed by assuming that screening has no direct effect on treatment: https://projects.iq.harvard.edu/files/applied.stats.workshop-gov3009/files/efficient_estimation_of_optimal_regimes_under_a_no_direct_effect_assumption.pdf
As a different on-the-ground example, the Sepsis Watch project and deployment is a huge implementation effort involving organizational change and a (pre-post) clinical trial. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391165/
It's still not clear how to appropriately evaluate these kinds of operational interventions -- I'd love to learn more about regulatory discussions and other examples on these fronts -- but I think there's interest converging from multiple points of view (stat/ml, biostats, HCI/AI) given the importance of these real-world systems.
Ben: This is an incredibly interesting post. Thanks. Even as it seems scary, I suppose the medical establishment needs some transparent (?) mechanism to make decisions. In the HEART setting, would it help if that functional was clearly available or if they released their code and model? In lieu of an NN. 🧐 look forward to more such posts.
I edited the post to directly link to a HEART score calculator:
https://www.mdcalc.com/calc/1752/heart-score-major-cardiac-events
It really is just adding up 5 numbers in a qualitative questionnaire. The interesting one is the EKG read. There you need a clinician skilled in reading EKGs. But otherwise, everything else is open code and model!
I’m looking forward to your reflections here. I just started reading Nancy Cartwright through an indirect recommendation that is ultimately from you, and so I’m naturally wondering how she would think about all of these small, hard-to-evaluate heuristics. But I’m still in the middle of the Introduction to the Dappled World so I don’t know what she’s talking about when she calls for “what can be accomplished if improvements are guided by what is possible not what is longed for.”
I’m also left wondering what I should be comparing our current patchwork medical ML rules to—should it be to an ideal doctor, or to the median one?
Interesting post. I thought the most "AI-infused" real-world contexts at present were serving of ads and credit scoring, not healthcare (yet)? Both of these are opaque. You seem concerned about evaluation. In these other contexts the evaluation is never made public, I think, but still (because?) it is huge business. Healthcare probably ends up in the same bucket (it is already there really). Evaluation seems surprisingly irrelevant to rules and recommendations in healthcare and elsewhere.