Yes, and conformal prediction is even worse because you get the probability of a conjunction: the guarantees are about the past samples AND the next sample.
With 95% probability, you get a training sample, which yields a set output from conformal prediction containing the next sample.
That's way more confusing and open to misinterpretation than confidence intervals.
I've got to say that I've rarely run into any kind of prediction-interval-like construct where it was what I wanted, as the person interpreting it.
A tolerance interval would usually at least be an improvement imo, with a vaguely PAC style interpretation: there's a 95% chance this interval really does contain 95% of the population. You still don't know whether you got one of the 5% bad intervals, of course, so it still has the verification problem. But at least you don't additionally have the strange conjunction.
fwiw, you can get those kind of guarantees using conformal prediction, but the sample complexity is prohibitively high: https://www.argmin.net/p/cover-songs
Indeed! The only way I can wrap my head around it's popularity is that the people using it also don't understand confidence intervals or it's just some easy to run code that allows you to claim some form of "AI Safety" via uncertainty quantification.
Given what you say, I can't see why you pooh-pooh severity and error statistical reasoning. The supposition that knowing a procedure performs well in general is scarcely irrelevant post data. Severity gives an explicit post-data interpretation. Take CIs. That the data warrant inferring a parameter exceeds the lower CI bound is warranted because if the parameter value was less than the lower CI bound, then with high probability we would have observed a test statistic greater than we did. It's analogous for the upper bound. This is what all statistical falsification is about, and really all warranted error-prone reasoning. Knowing the capabilities of our methods enables us to learn what is and is not well warranted in the case at hand.
"venerate the scourge of Ronald Fisher" 🤣💀
This meager guarantee is also all you get from conformal prediction.
Yes, and conformal prediction is even worse because you get the probability of a conjunction: the guarantees are about the past samples AND the next sample.
With 95% probability, you get a training sample, which yields a set output from conformal prediction containing the next sample.
That's way more confusing and open to misinterpretation than confidence intervals.
I've got to say that I've rarely run into any kind of prediction-interval-like construct where it was what I wanted, as the person interpreting it.
A tolerance interval would usually at least be an improvement imo, with a vaguely PAC style interpretation: there's a 95% chance this interval really does contain 95% of the population. You still don't know whether you got one of the 5% bad intervals, of course, so it still has the verification problem. But at least you don't additionally have the strange conjunction.
fwiw, you can get those kind of guarantees using conformal prediction, but the sample complexity is prohibitively high: https://www.argmin.net/p/cover-songs
Indeed! The only way I can wrap my head around it's popularity is that the people using it also don't understand confidence intervals or it's just some easy to run code that allows you to claim some form of "AI Safety" via uncertainty quantification.
Given what you say, I can't see why you pooh-pooh severity and error statistical reasoning. The supposition that knowing a procedure performs well in general is scarcely irrelevant post data. Severity gives an explicit post-data interpretation. Take CIs. That the data warrant inferring a parameter exceeds the lower CI bound is warranted because if the parameter value was less than the lower CI bound, then with high probability we would have observed a test statistic greater than we did. It's analogous for the upper bound. This is what all statistical falsification is about, and really all warranted error-prone reasoning. Knowing the capabilities of our methods enables us to learn what is and is not well warranted in the case at hand.