20 Comments
User's avatar
Missan Belat's avatar

Nice. Would be interested in something similar regarding Thinking, Fast and Slow

Expand full comment
Ben Recht's avatar

Indeed! I have added this to my todo list.

Expand full comment
Erik's avatar

Interesting post. I have a few thoughts. Useful predictions are often conditional (eg. MPC arguably combines unpredicted disturbances as condition for optimizing prediction based future action). Many complex systems are also often "reflexive", meaning that the prediction itself changes the future "probabilities" (sort of like closed-loop vs open-loop but more philosophical). Some predictions can lose their value if broadcast (zero-day exploits is arguably in this category), and some hidden predictions may (perhaps) become more certain if other predictions (which are not believed) are broadcast instead (disinformation?). It is not clear to me if your post limits itself to a particular type of prediction in some "stable" not-very-complex environment. But then one example is about the pandemic which is not a stable simple system. There were early predictions that respiratory pandemics cannot realistically be stopped in the long run. Time scales matter. The heat death of the universe is a certain far away prediction many would agree. Also, how would you formally state your prediction that Twitter is being driven into the ground? When will you evaluate it? What are your objective metrics? Nothing goes as planned, but error correction based on feedback might.

Expand full comment
Ben Recht's avatar

As you say here, context matters. I talk about Nancy Cartwright in the next blog, and she has a great example: Using F=ma and a = g, we can predict how a bowling ball falls when dropped from a tower. But we can't use these laws to predict where a dollar bill dropped from the same tower will land.

Might lumping MPC together with pandemic prediction only lead us to confusion rather than clarity?

Expand full comment
Erik's avatar

Thanks for writing your blog, and for reading my comment! I'd say I would not want MPC anywhere near pandemic stuff (I think it might be tempting when you are young and apple cheeked and believe SIR models are more than toy systems and suitable for central planning using that feedback control thing you just learned about in school). I just tried to illustrate various types of predictions and contexts, thinking out aloud. I tend to get confused with the word "prediction" flying around by itself so much these days.

But what about your Twitter prediction? Can it be formalized?

Expand full comment
Ben Recht's avatar

I certainly didn't mean for my Twitter prediction to be formal. The outage on July 1 and the clear lying about it by Musk were just the final straw in months of bad management for me.

For whatever it's worth, I'd never bet against Musk. 1. I'm super risk averse in betting (Mr. Index Fund here), and 2. that guy has a cult behind him and is able to become the richest person in the world no matter how many laws he breaks or lies he tells.

My subjective probability on everything Elon is a perpetual state of disgusted disbelief.

Expand full comment
Erik's avatar

Got it. Thanks for sharing. Elon surely is a colorful figure who makes a difference in all sorts of ways. Routine landing of rockets on barges (with MPC I'm sure) is pretty awesome though.

Expand full comment
Akhil Bagaria's avatar

"There are other ways to state and formalize uncertainty that don’t allow for these silly debate tactics" - which uncertainty quantification methods are you thinking of here?

Expand full comment
Ben Recht's avatar

This is a quote of mine that I'm going to need several blogs to unpack. I promise to do so!

But roughly speaking I mean things like deterministic models and also blanket statements like "I don't know." More to come.

Expand full comment
Vaden Masrani's avatar

Great post! Gotta add "longtermism" to your list of things to debunk - superforcasting over billion year timescales? Yes please.

(Also podcast link is broken I believe)

Expand full comment
Ben Recht's avatar

Updated the link. It's a great podcast.

I worry debunking longtermism is akin to debunking a millenarian religion...

Expand full comment
Vaden Masrani's avatar

Hehehe that's what I enjoyed most about it... Although I do think there is a sort of "professional obligation" of sorts for statistians etc to weigh in on these topics, bc all this stuff is getting off the ground by sophists using probability theory to mislead the public (who generally don't have the background to tell when Bayes theorem is being used inappropriately)

Expand full comment
Ben Recht's avatar

"Bayes theorem is being used inappropriately" is the working title for a future post.

Expand full comment
Dylan Gorman's avatar

The only way I can really wrap my mind around probabilities of one-off events is to frame it as "how much money could you make in a prediction market?"

Expand full comment
Philipp Renz's avatar

Fully agree with that: "They are a convenient way for sophists to never be wrong. You don’t even have to hedge your bets. You just say “I’m 95% confident X will not happen” and when X happens you say “Well, I said there was a chance.”"

But not with this: "We should be mad at Tetlock for popularizing the idea of answering yes-no questions with probabilities. Subjective probabilities mean nothing." - I think it's very reasonable to answer yes-no questions with probabilities. I think you would deem it perfectly reasonable for me to say that a coin toss comes up heads with 50% probability. But that's not worth a lot without a convincing evaluation of the predictions. I think one should rather be mad at them for being pretentious, while not really showing any impressive performance.

I wouldn't say that the Brier score is deeply flawed. Could you elaborate why you think so? I feel like that it's a meaningful measure for prediction accuracy, as it rewards you for giving predictions close to the correct answer. But of course that doesn't help if one evaluates it using easy/irrelevant questions.

Expand full comment
Peter's avatar

>To compute a Brier score, predictors must return a probability for every yes or no question (!!! this is where everything goes to hell !!!).

Are you against probabilistic forecasting in general?

>But the Brier Score is deeply flawed and depends heavily on the corpus of prediction questions. […] If most of the events in the corpus are “Will the sun rise tomorrow?” and one of the questions is “Will AGI kill us on Thursday?” Then getting the AGI question wrong is not likely to affect your score.

Sure. But the claim about superforecasters is not "there are people who get an average Brier score of 0.1, let's completely trust their AI forecasts", it's about the top 2% (IIRC) performing way better than chance on the next set of (order of) 100 questions, defying the regression to the mean effect that you'd expect if probabilistic predictions were just random guesses. This suggests that there is some sort of latent variable—let's call it "forecasting skill". Surely, all else equal, you'd trust forecasts by someone who appears to have more of this latent variable more than someone with less?

Also from a very qualitative point of view it becomes pretty obvious that there is something like forecasting skill once you compare written reasonings of good forecasters to those of the not-so-good ones.

(If the whole point is that superforecasters are still human, occasionally badly wrong on important questions, and a marketing stunt (I mean it's literally a term coined by a project with a for-profit spin-off), then I wholeheartedly agree. But in this case your post is pretty misleading and going by the comments, I'm not the only one who interpreted it very differently.)

Expand full comment
Will's avatar

I am open to superforecasters being bullshit but this blog post is nowhere near sufficient to convince me of that. Tetlock does not just use Brier scores; he also looks at how far from 50% you're willing to venture. And to your dismissal that predicting things will stay the same would be sufficient to be a superforecaster -- my understanding is that a big part of the job of forecasting is deciding which outcome is more in line with "things staying the same" (i.e. what the most suitable reference class is).

I assume you're a smart thinker about statistics, but this post reads as a lazy dismissal of careful work on unclear grounds.

Expand full comment
static's avatar

This argument structure is pretty bad. You admit method x is better than current method y, but since method x isn't perfect, it shouldn't be used.

On your COVID point, on Feb 20, 86% predicted the range would be between 100,000 and 200,000 on March 20, 2020. The actual reported number didn't cross the 200,000 number until March 19 (209k), so the most likely scenario was 1-2 days off a month out? It doesn't seem that bad or an example of a rare event or unknown unknown.

Expand full comment
Ben Recht's avatar

1. "You admit method x is better than current method y, but since method x isn't perfect, it shouldn't be used." I don't know what you mean.

2. Your defense of the covid prediction is a great example of what I mean by superforecasters never being wrong.

3. Why do you comment anonymously?

Expand full comment
static's avatar

1. You said the superforecasting approach is superior to other prediction techniques and offer no better option, yet your conclusion is don't use it, leaving people in the position of not using prediction techniques.

2. I am not defending it, I am pointing out how you are misleadingly labeling it as an example of a rare event or unknown unknown, as well as decontextualizing it to not show that the majority prediction range was extremely close to the actual range. Your description of the situation intentionally created a false impression. This is absolutely nothing to do with superforecasters never being wrong, and everything to do with your hyperbolic rhetorical style.

3. Because I don't like people harassing me personally or making ad hominem attacks or making assumptions about my positions based on my identity. It works better this way.

Expand full comment