11 Comments
User's avatar
Harsha V's avatar

The problem with trying to get a well-calibrated forecasts is that the edits to the forecast to make it better calibrated (even using the strategy like Forster's or Hart's) might destroy the forecast's utility. It's trivial to get a well-calibrated day-ahead rain forecast for Seattle -- always predict 40%. The trick is to get a calibrated forecast that is also "sharp".

Expand full comment
Ben Recht's avatar

what does "sharp" mean?

Expand full comment
Harsha V's avatar

I used "sharp" in the technical sense that statisticians use. As a forecaster we would like to produce the sharpest or "lowest entropy" forecast distributions that are also calibrated with respect to point outcomes. Cf: https://sites.stat.washington.edu/raftery/Research/PDF/Gneiting2007jrssb.pdf

Expand full comment
Ben Recht's avatar

But that paper never defines sharpness! It uses the word without ever defining what it means.

Expand full comment
Harsha V's avatar

Ha. You're right. I had just skimmed the abstract.

I'm not a statistician, but am required to dabble in statistics for my work, so I've always thought sharpness to mean something like the forecast distributions have to be low entropy.

If the forecaster produces a sequence of forecast distributions Qi for some sequence of random variables Xi which have god-given distributions Pi (I share your disdain for "true" distributions), then we want a good forecast to have small \mean_i KL(Q_i || P_i)

(I deliberately used the reverse to the usual order of Pi and Qi in the KL divergence.)

\mean_i KL(Q_i || P_i) = mean_i (H(Q_i)) - mean_i(sum_j Q_ij log P_ij)

To have a small left hand side, we need H(Q_i) to be small (sharpness) and the sum_j Q_ij log P_ij to be large (calibration).

This is all hand-wavy of course but perhaps could be made more rigorous.

Expand full comment
Harsha V's avatar

Nvm. I think that argument has sign errors

Expand full comment
Grant Reinman's avatar

I queried ChatGPT about forecast sharpness and it generated this response:

Sharpness in forecasting refers to the concentration or precision of a predictive distribution, without considering its accuracy relative to actual outcomes. A sharp forecast is one that assigns high probability to a narrow range of values. Importantly, sharpness is a property of the forecast alone, while calibration measures how well the forecast aligns with observed outcomes.

Quantifying Sharpness

Sharpness is typically measured through metrics that capture the spread or concentration of the predictive distribution. Common methods include:

1. Standard Deviation (SD):

• For a univariate normal distribution, sharper forecasts have smaller standard deviations.

• Example: A forecast with a predicted mean temperature of 70°F ± 2°F is sharper than one predicting 70°F ± 10°F.

2. Interquartile Range (IQR):

• Measures the range between the 25th and 75th percentiles. Narrower IQR values indicate sharper forecasts.

3. Entropy (for probabilistic forecasts):

• Entropy quantifies the uncertainty of a distribution. Lower entropy indicates a sharper forecast.

H(p) = -\int p(x) \log p(x) \, dx

4. Proper Scoring Rules with Emphasis on Sharpness:

While strictly proper scoring rules (e.g., the Continuous Ranked Probability Score, or CRPS) reward both calibration and sharpness, comparing forecasts with similar calibration allows CRPS to isolate sharpness.

5. Variance-Based Measures:

• For multivariate distributions, the determinant of the covariance matrix can serve as a measure of volume, with smaller determinants corresponding to sharper forecasts.

Balancing Sharpness and Calibration

• Sharpness alone is not enough; a sharp forecast that is poorly calibrated (overconfident) is misleading. The Brier score and CRPS combine sharpness and calibration effectively.

• The Decomposition of CRPS can isolate sharpness from calibration error, making it a powerful tool for evaluation.

References for Further Reading

1. Gneiting, T., & Katzfuss, M. (2014). “Probabilistic Forecasting.” Annual Review of Statistics and Its Application, 1, 125–151.

• This is a foundational paper on probabilistic forecasting that discusses sharpness and calibration in detail.

2. Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). “Probabilistic Forecasts, Calibration and Sharpness.” Journal of the Royal Statistical Society: Series B, 69(2), 243–268.

• This paper formally defines sharpness and discusses proper scoring rules like CRPS.

3. Wilks, D. S. (2019). Statistical Methods in the Atmospheric Sciences.

• A comprehensive text that covers forecast evaluation, including sharpness, in the context of meteorology.

Expand full comment
John Quiggin's avatar

In Bayesian decision theory, you are playing a game with (non-adversarial) Nature, and there is normally a unique optimal calibration. If there's an adversary, you need game theory and a Bayes-Nash equilibrium which need not be unique.

Expand full comment
Ben Recht's avatar

But how does a coherent Bayesian know if nature is stochastic or adversarial?

Expand full comment
John Quiggin's avatar

For a Bayesian, the stochasticity of Nature is axiomatic.

Expand full comment
John Quiggin's avatar

For a Bayesian, the stochasticity/disinterestedness of Nature is axiomatic. If the universe has purposes (friendly or adversarial), it must be treated as an other agent, and game theory is applicable.

Expand full comment