Clinical versus Statistical Prediction (II)
Meehl's Philosophical Psychology, Lecture 10, part 2.
This post digs into Lecture 10 of Paul Meehl’s course “Philosophical Psychology.” Technically speaking, this lecture starts at minute 74 of Lecture 9. The video for Lecture 10 is here. Here’s the full table of contents of my blogging through the class.
One of the more common misreadings of Meehl is that he thought you could somehow do away with clinicians altogether. This was not his position, and as we’ll see more in Lecture 11, Meehl did not believe that all decisions could be made statistically. His aim was determining the scope of statistical judgment and when it might be useful. There was a significant set of decisions where he deemed statistics superior. By being precise about this subset, he thought that he could both improve care and simplify the life of the clinician, allowing them room to automate part of their job. Today, let’s hone in on the sorts of predictions Meehl thought were best decided by statistical methods.
Actions
Meehl first clarifies that the goal should be about predicting the outcome of interventions. He is not interested in diagnostic tests. He is not asking about the construct validity of testing for diseases. (He has written other papers about that topic!) Here, he wants to understand how to predict the consequences of actions.
All of the example questions he asks are attempting to predict how an action will affect a particular person. If granted admission, will a person succeed in law school? If released from prison, will a person recidivate? If a depressed person isn’t hospitalized, will they commit suicide? If a person receives shock therapy, will their depression be relieved?
These sorts of questions are about the impact of single actions. They also have yes or no answers. Meehl focuses on questions with a small list of possible outcomes. For open-ended questions, Meehl thought clinical expertise was indispensable. It was only for problems with simple multiple-choice answers where he thought statistical decision-making could play a role.
Data
To make the decision, Meehl assumes the clinician has the same data as the statistical rule. He belabors distinguishing between the kind of data and the mode of combining the data. As long as the statistical formula and the clinician are presented with the same information, the data can be anything: interviews, life history data, a mental test, other biometrics.
Obviously, such data has to be transformed into a machine-readable format somehow. Here’s another place the clinician may be indispensable. A clinician may be required to observe a patient’s behavior or facial expressions and write down appropriate diagnostics. Today, this could perhaps also be done with statistical machine learning. In his 1989 lectures, he notes that character recognition is still barely functional. He doesn’t rule out the possibility of more sophisticated pattern recognition methods being used if computers improve. (Spoiler alert: they did).
Regardless, he just wants the computer and the clinician to be using the same data. The controversy is about the mode of combination not the data types.
Mechanical and actuarial rules
Meehl defines two forms of algorithmic decision rules. First, there are mechanical rules, which we now call algorithms. Mechanical rule and algorithm are synonymous. A mechanical rule is a well defined, step by step process for translating data into a decision that can be implemented on a computer.
Actuarial rules are a special kind of mechanical rule. They are algorithms that make decisions based on rates of past occurrences. These are the statistical prediction methods. A decade ago we called these prediction methods machine learning. Today we call them AI.
Actually, now that I think about it, we’re in the goofy phase of the hype cycle where all mechanical rules are now annoyingly called AI. So I’m going to use Meehl’s terms of mechanical and actuarial to keep things clear. Let me still emphasize that Meehl’s clinical-statistical question asks when AI is better than people at making decisions. There’s a large academic community that still argues the answer is never. As we’ll see, Meehl does not agree.
Clinical judgment
Meehl’s definition of a clinical judgment is a bit more vague. He says it’s anything “informal” made by a human specialist. It’s whatever process occurs in a person’s head. Clinical rules are those made by clinicians based on intuitive assessments of data. These are decisions that clinicians can’t cleanly explain and hence aren’t formalizable as algorithms.
The clinical-statistical question
With all of this setup, we can now pose Meehl’s central question:
Given a decision problem with a small set of possible outcomes and an appropriate, fixed collection of data, do actuarial rules or clinical judgment provide more accurate judgments about the future?
For this narrow but broadly applicable question, Meehl came down solidly on one side: Statistical prediction would never be worse than clinical prediction.
If you had asked me a year ago, I’d have vehemently disagreed. But I’ve come around. Meehl provides compelling empirical evidence in his 1954 book. And 70 years of studies have backed him up. You’d be hard-pressed to find a result in social science that is as robust as statistical decisions outperforming clinical judgment. After grappling with the evidence and the counterarguments, I now totally agree with Meehl. Tomorrow, let me try to convince you, too. I will present both the empirical evidence, Meehl’s philosophical arguments, and what I consider to be a simple but deceptively subtle explanation. It’s through the subtlety that we might find some resolution.
This has been circulating today: https://x.com/arthur_spirling/status/1812875064205009277
"Supervised machine learning was going to revolutionize prediction/inference for politics. What happened? [They] use "intrinsic dimension" to show it's v hard to beat simple (OLS, logit) models for polisci tabular data." I'd say social scientific data in general.
They conclude with: "That is, there clearly exist problems (e.g. Salganik et al.,2020) where all predictions are low quality, and the fact that a model is simple is cold comfort given our actual aims. In those cases, better and more (unstructured) data is the answer. I guess the million dollar question is: how does that work for these kinds of questions? "If granted admission, will a person succeed in law school? If released from prison, will a person recidivate? If a depressed person isn’t hospitalized, will they commit suicide? If a person receives shock therapy, will their depression be relieved?"
Looking forward to the empirical evidence!
Cliffhanger!