26 Comments
Feb 8Liked by Ben Recht

Hi Ben,

such intervals seem to me quite useful to formulate (robust) optimisation problems. A specific type being for instance scheduling problems where you have a set of events and a partial order relation over them. The intervals would provide lower and upper bounds on the time elapsed between events (e. g. "Start cleaning house", "Finish cleaning house").

I would love to have close look at those prediction bands in the context of lookahead and bandit algorithms for approximate dynamic programming (MCTS, and beyond).

Expand full comment
Feb 8Liked by Ben Recht

I want "honest" unbiased prediction bands around the following.

1) Predictions of personalized treatment decisions i.e Expected benefit of treatment a over treatment b, given person-specific covariates. This comes up in cancer and psychiatry a lot. Most "ML predictions" are population quantities that might be totally inadequate for use in clinical decision making.

2) Change in polygenic risk scores. In this case if someone claims to be able to rank embroyos for screening, I want to see the prediction bands for expected risk reduction in picking embryo A vs. embryo B or some relaxed version of it (top 10% quantile over next 10% quantile.)

3) For drug development, you need good prediction bands around expected future performance of a large number of candidates from high throughput screens. Quite challenging to evaluate accuracy of intervals here for each individual candidate as most candidates have rarely been measured in expensive high fidelity experiments

Really wish reviewers for Nature journals actually understood why this practically necessary for all the applications they are prematurely excited about.

Expand full comment
Feb 9Liked by Ben Recht

The comment about things falling apart when the "DGP" changes.. what does that have to do with a prediction interval? Any old ML prediction, interval-ed or not, suffers from this breakdown no? Also, this breakdown itself could be what the prediction (interval) is constructed for (say monitoring for change detection, trying to invalidate itself). On a different note, quantile regression is really fascinating isn't it?

Expand full comment
Feb 8Liked by Ben Recht

> prediction bands to estimate the efficacy of a drug. But I’m confused here because that’s what randomized trials are for

I'm guessing what people want here is prediction bands for conditional average treatment effects for the drug or some such thing.

> In order for the math to be valid, both the future outcomes and the future features must occur in the same way they occurred in the past

Yes, I think for them to be useful one needs to design studies/experiments to make this actually true. Very hard to convince people to collect enough data to make estimating uncertainty accurate though.

Expand full comment
Feb 8Liked by Ben Recht

How about weather forecasting as an application where prediction intervals for nonparametric regression with marginal guarantees can be useful? The residuals are plausibly exchangeable, and it's not a setting where we are using the forecasts to influence the outcome.

Expand full comment
Feb 19Liked by Ben Recht

Google Maps gives you estimated time intervals rather than point estimates if you plan a car trip in the future (e.g. same time but tomorrow). This probably reflects uncertainty in the traffic.

As a user I find this pretty useful if you have to make a decision between several modalities (e.g. public transportation vs bike vs taxi).

Expand full comment
Feb 9Liked by Ben Recht

Any active decision making process where you get to choose where to gather more information? Eg. Bayesian optimisation/ Kriging to determine where to drill for gold based on a predictive model of the reef.

Expand full comment

You might want to check out the recent work on conformal decision theory (https://conformal-decision.github.io/) if you haven't seen it. I'm not sure how well it meshes with the case you're making here, but it seemed roughly on theme: to make contextually relevant risk-relevant decisions, you're better off calibrating the decisions directly and skipping the coverage sets.

Expand full comment
Feb 8Liked by Ben Recht

The probability of an event and the confidence interval of a prediction are actually two sides of the same coin. Take a simple linear model with normal errors, the confidence interval that a prediction <= some threshold corresponds to the probability returned from the equivalent probit regression.

Expand full comment