One of my favorite ways to troll on Twitter was to ask rhetorical questions posed to needled certain audiences. Last week I asked “What is the evidence that superforecasting isn't bullshit?”
But one of the things I loved about Twitter was the hive mind. It’s what I’ll miss most once the Chief Executive Edgelord autonomously drives the place into the side of a truck. Twitter had tons of smart people who have looked at every possible topic out there. And I got a bunch of great information about why superforecasting is indeed essentially bullshit.
In case you don’t know, Superforecasting is term coined by Phil Tetlock to describe a “scientific” way to form predictions that are better than experts. In the replies to my thread, Vaden Masrani pointed me to his informative podcast on Tetlock’s book. And Twitter person @BarneyFlames pointed me to a great blog on the topic. Let me quickly summarize the main points of this blog, which you should read, because it’s excellent!
Tetlock’s original research was showing that “pundits” are wrong most of the time. He was dead on here.
But then Tetlock took this idea too far and thought he could design systems to predict better than “experts”
Tetlock’s claims are usually based on the Brier Score. To compute a Brier score, predictors must return a probability for every yes or no question (!!! this is where everything goes to hell !!!). The Brier Score is then the mean square error between the prediction probabilities and the true 0-1 outcomes.
But the Brier Score is deeply flawed and depends heavily on the corpus of prediction questions (this should sound familiar to my machine learning evaluation friends). If most of the events in the corpus are “Will the sun rise tomorrow?” and one of the questions is “Will AGI kill us on Thursday?” Then getting the AGI question wrong is not likely to affect your score.
Indeed, just predicting that things are likely to stay the same will make you a superforecaster.
But the events that matter for prediction are the rare events. These are the ones that require significant planning, resources, and action. But Brier scores can’t tell you who is going to be good at predicting rare events.
Moreover, even Tetlock admits that prediction accuracy decays quickly with the time horizon. Predicting the uncertain, unknowable future is impossible whether or not you are willing to assign subjective probabilities to your predictions.
We should be mad at Tetlock for popularizing the idea of answering yes-no questions with probabilities. Subjective probabilities mean nothing. They are a convenient way for sophists to never be wrong. You don’t even have to hedge your bets. You just say “I’m 95% confident X will not happen” and when X happens you say “Well, I said there was a chance.” There are other ways to state and formalize uncertainty that don’t allow for these silly debate tactics.
Probably the most damning thing I’ve read is how badly superforecasters missed the covid pandemic. On Feburary 20, 2020 Tetlock’s superforecasters predicted that the chance there would be over 200K covid cases by March 20, 2020 was 3%. Oops.
But they said 3%. So they said there was a chance, right? So clearly they weren’t wrong.
This gets to the crux of it for me. People only care about predictions when the outcome has a big impact. People are most impressed when you predict something that no one else saw coming. Being good on average is neither impressive nor valuable if all you are selling is your prediction skills. “Edge cases” are what doom our prediction-based systems, not average cases. We’re incapable of engineering prediction-based autonomous systems that have to account for rare events. Rumsfeld’s unknown unknowns always come to get us in the end.
Nice. Would be interested in something similar regarding Thinking, Fast and Slow
Interesting post. I have a few thoughts. Useful predictions are often conditional (eg. MPC arguably combines unpredicted disturbances as condition for optimizing prediction based future action). Many complex systems are also often "reflexive", meaning that the prediction itself changes the future "probabilities" (sort of like closed-loop vs open-loop but more philosophical). Some predictions can lose their value if broadcast (zero-day exploits is arguably in this category), and some hidden predictions may (perhaps) become more certain if other predictions (which are not believed) are broadcast instead (disinformation?). It is not clear to me if your post limits itself to a particular type of prediction in some "stable" not-very-complex environment. But then one example is about the pandemic which is not a stable simple system. There were early predictions that respiratory pandemics cannot realistically be stopped in the long run. Time scales matter. The heat death of the universe is a certain far away prediction many would agree. Also, how would you formally state your prediction that Twitter is being driven into the ground? When will you evaluate it? What are your objective metrics? Nothing goes as planned, but error correction based on feedback might.