Selecting the wrong subset
Small lists of features bake in bad decisions
Commenter Cosma pointed me to an excellent rant about the pitfalls of clinical decision rules by ER doc Justin Morgenstern. “Clinical Decision Rules Are Ruining Medicine” is 10,000 words, but it’s worth a read if you are interested in decision making in healthcare or in “algorithmic decision systems” more broadly.
Morgenstern makes an important point that I haven’t raised here yet. Clinical decision rules are nothing more than a narrow summary of conventional wisdom. And they can be nothing more than that. But by summarizing narrow conventional wisdom into a simple calculator, the rules end up hindering flexibility and ignoring expertise.
How does a rule summarize conventional wisdom? A research team first assembles a set of features that doctors all agree are important for a decision. In the HEART score I discussed last time, the ingredients were a patient’s history, EKG, age, risk factors, and troponin level. The qSOFA score for predicting sepsis combines a patient’s mental status, respiratory rate, and blood pressure. Every risk score takes a small list of clinically relevant features like this. Why are these features included and not others? These selection decisions are always at the discretion of the research team.
Once a team chooses its list of explanatory features, they combine the features in some way into a quantitative score. HEART just scales each feature from 0-2 and then adds these up. qSOFA looks for abnormalities in two out of three of the features. More sophisticated methods use regression on historical patient data to find better weightings of these features. And clinicians are working on “AI” methods that might take an EKG and run it through a neural network as part of the scoring. But in all of these cases, the features are the only thing that matters here. Once the features are chosen, improving the combination of factors only gives some diminishing returns in sensitivity and specificity.
Why are we including only these features in a score? What about other information that a doctor knows is clinically relevant? What about nuance near the boundary of some of the rules? Is there something very different about people who are 45 and 46 years old? The score excludes all of this nuance. Scores only consider a subset of what goes into a clinical decision. Morgenstern notes
“it is going to be very difficult for a subset of clinical judgement to beat total clinical judgement.”
Of course, he’s correct. Morgenstern takes this a step further, arguing the reliance on a small set of obvious features renders “many decision rules [as] just insulting.” He uses the qSOFA score as an example. If a patient is in the ICU with an altered mental state and low blood pressure, they obviously need close attention I’m not a physician, but I could have told you that.
Morgenstern is highlighting a broader problem in the dichotomy between decisions for populations and individuals. Guidances for a population, like tax brackets, necessarily will be crude compromises because policy aims to benefit broader societal values, and there are not enough resources to make everyone happy. But in hospital care, we attend to the outcomes of individuals and have to appreciate how every patient encounter deviates from normal. Clinical decision rules mistakenly apply the large-scale, blurry statistics of populations to individuals, and Morgenstern argues this is ruining medicine.