4 Comments
User's avatar
Nelson Elhage's avatar

I'm confused. You observe the (trivial) result that you can't possibly get a low Brier score in the adversarial context, but then you go on to observe that you can frame Brier scores as a calibration problem, and that you can always minimize calibration errors with this One Neat Trick. But obviously you can't do that for Brier scores, so there's got to be more assumptions buried in the result (I haven't looked at the paper, I freely admit).

So, I am indeed left feeling a bit like you're trying to pull a fast one...

Zoë Ruha Bell's avatar

Could you say a bit more about the compare & contrast with frameworks like outcome indistinguishability by Dwork Kim et al? https://arxiv.org/abs/2011.13426

Ben Recht's avatar

Defensive forecasting's generalized calibration subsumes multicalibration and outcome indistinguishability. See Lemma 4.1: https://arxiv.org/abs/2506.11848

Zoë Ruha Bell's avatar

Sure, I guess I more so meant what’s the conceptual difference—so if generalized calibration is more general, what’s that more general notion conceptually vs what’s the special case captured by outcome indistinguishability?