19 Comments
Feb 11Liked by Ben Recht

Predict_addict’s head must be exploding reading this.

Expand full comment
Apr 1Liked by Ben Recht

Really enjoyed this post! Argmin now is my must read!

Expand full comment

Very interesting post, Ben.

I do not share your concerns with respect to the centrality of the IID assumption. I'd say that having data that are at least conditionally independent and identically distributed is a prerequisite for any kind of empirical modelling.

Empirics, and so the inductive way of reasoning that you use when you build a quantitative model to predict something does require to have some regularities in the dgp that you can exploit to produce these forecasts.

In the easiest cases you have "pure" iid-ness, if you need to add "structure" to your modelling, that's where the conditional part comes in.

I get your point about "proving the wrong theorems", but I do not think I share it. In fact, I think it is one of the most interesting selling points of CP. At the same time, I agree with you that some additional work with respect to the size guarantees one gets when using CP should be in order. My feeling (but I really should study a bit more... :) ) is that exploring the intersection between scoring rules a-la-Gneiting (either univariate or multivariate) and NCMs could be an interesting piece of research.

The part I share, and I find quite fascinating about your post (and, to be fair, I did not grasp after reading Vovk's paper) is indeed that apparently there are some parts of CP that are, for the lack of a better word, "asymptotic". Which hinders (and should be raised as a caveat...) the use of CP in my own use cases, where CP is used to produce prediction regions with coverage guarantees using simple, "statistical" models, but used on high dimensional/complex outputs.

I believe this latter point to be very interesting in general from a theoretical perspective, but having seen where part of the research of CP is going (which is... let's provide UQ for very complex models in a big data setting... see e.g. all the work by the Stanford guys), probably a bit orthogonal with what current research is doing.

In any case, happy to discuss and share views... Or alternative approaches, since it is not super-easy to find out there non-parametric methodologies for distributional prediction.

Expand full comment

This post is far from being objective.

"First and foremost, all of the beautiful rigor relies on the assumption that past data is exchangeable with the future. We are drawing logical conclusions under a fanciful, unverifiable hypothesis."

There are ways of verifying exchangeability. If someone uses a tool wrongly, that's not the tool's fault.

"Second, we should be worried that you get the same probabilistic guarantee regardless of the scoring function. Conformal prediction almost invites you to use garbage prediction functions."

You get the same guarantee, but the prediction regions are useless. Why should we be worried? One could say the same for all Machine Learning techniques, they provide you with predictions even if you use garbage data.

Expand full comment

"Conformal prediction does not tell you the probability the future is in the prediction set is 95%. It tells you that the probability that the algorithm returns a prediction set containing the future is 95%. These Monte Carlo guarantees are very different!"

Could you clarify what the difference between these two events is? They are sounding exactly the same to me. Is the stochasticity due to different sources in the two cases - if so, what are these?

Expand full comment

"Conformal prediction almost invites you to use garbage prediction functions. You’ll get the same coverage guarantees for a carefully fine-tuned transformer model as you’ll get for a literally random function."

This is true, but your prediction intervals will vary in size depending on the performance of the underlying model.

Expand full comment