You know exactly what I’d say: Probability 1 and Probability 2 are both called “probability” because they obey Kolmogorov’s axioms. But the axioms won’t tell you how to bridge these concepts, that’s a pragmatic, problem-dependent affair.
Degrees of belief obey Kolmogorov's axioms if you assume they obey countable additivity (de Finetti wanted finite only), are sharply defined (imprecise probability is an alternative), and are real-valued (hyperreal credences are an alternative).
There’s a way to reconcile de Finetti coherence with Kolmogorov’s axioms, as shown by Vivek Borkar and Sanjoy Mitter (https://www.mit.edu/~mitter/publications/102_ondefinetti_elsev.pdf). As for hyperreals, I’m just a humble electrical engineer from a backwoods Midwest university.
I just always find it weird that Probability 1 only obeys the Kolmogorov axioms if you agree to Dutch Bookery or Cox Axioms. It's always a jump to force them to match P2.
More important is what is your hottest take on Megadeth?
Not all instances of Probability 1 require Dutch-booking — e.g., I don’t need it to set up a probability space with a Brownian motion defined on it. It’s a Probability 1 construct, but no betting is involved.
I know this is sliippery, but Isn't brownian motion probability 2? It's object linguistic. The descriptive model is probability 2, predictive probability 1? What do you think?
I'm not disagreeing, but in this conception doesn't that mean that the probability of a coin flip also starts off as probability 1 and is mapped to probability 2?
I don't think calibration is trivial, and think its a good frequentist perspective with which to think about probability. Its true that on Ricky's example, marginal calibration is entirely uninformative. But if the sequence of predictions is calibrated not just marginally, but also on the subsequence of odd indices and the subsequence of even indices, then they must be perfect. As you start asking for calibration on more subsequences/conditional on more features of the data, you capture more structure. Already in 1985, Dawid thought about the consequences of asking for calibration on different subsequences, and using this as a foundation for probability: https://projecteuclid.org/journals/annals-of-statistics/volume-13/issue-4/Calibration-Based-Empirical-Probability/10.1214/aos/1176349736.full Among other things, two computable sequences of forecasts that are calibrated on every computable subsequence must agree almost everywhere (everywhere except on a finite sequence of predictions), and thus so long as any "true data generating process" turns out to be computable, computably calibrated forecasts must recover it. And another one of Ricky's results (with Smorodinsky and Sandroni) --- that can also be derived from Blackwell Approachability --- is that there is a computable algorithm that makes forecasts that are guaranteed to be calibrated on any countably infinite collection of subsequences: https://pubsonline.informs.org/doi/abs/10.1287/moor.28.1.141.14264?casa_token=EPfn-cDedJUAAAAA:UV1W7dx73P2LsJKZTfD8h3sLPiRrpZj5hrNjVXrpG_BVe_ng7ETUE6RnYgBm9K8fMV32G37lMg so Dawid's vision is sort of realizable (at least in the infinite data/compute limit :-)
Of course talking about computability doesn't give you much insight into practical prediction methods, but the recent literature on multicalibration revisits these ideas with a computational/finite-data statistical lens. In general computationally efficient variants of multicalibration can't promise that they recover "truth" in the worst case, but you can ask what you've learnt if you produce predictions that are calibrated conditional on different kinds of structure. You can e.g. view calibration as a boosting procedure, and then ask where you need to be calibrated to recover the structure in the data: https://arxiv.org/abs/2301.13767 or ask what guarantees you get in the worst case given that you are calibrated with respect to a certain family of functions/subsequences: https://arxiv.org/abs/2109.05389
I agree that frequency seems to be basic notion behind probability for technical purposes. However, I wonder if belief can become important when it comes to individuals actually using probability in their own life, for example as a way to update P(intervention works on me) as we try something new.
Of course, one option that reduces to frequency in this case is something like your N-of-1 paper. But running such an algorithm on *all* interventions (e.g., new sports regimen, a diet tweak, etc.) seems a bit overkill? I think the 'belief' interpretation can offer a more relaxed formalism that can be more intuitive to use across a wider range of applications than just new medications.
For instance, the bayesian view suggests initializing P_0(intervention works on me) = historical frequency of success of intervention in a population (or relevant subpopulation) and then updating according to Bayes rule: $P_{t+1}(intervention works on me) = \frac{P(effect on day t| intervention works on me) * P_t(intervention works on me)}{P(effect on day t| intervention works on me)*P_t(intervention works on me) + P(effect on day t| intervention doesn't work on me)*(1-P_t(intervention works on me))}$. In my opinion the appeal is that the bayesian framework only needs some specification of P(effect on day t| intervention works on me) and P(effect on day t| intervention doesn't work on me) with really minimal requirements other than that they are non-negative and sum to less than 1. And it seems like people could at least intuit some rough lower/upper bounds on this (+ with requesting upper bounds there isn't even a need for the summing to less than 1 constraint).
Also, one slightly off-topic question: in the lecture on probability in our class I once jokingly said that maybe we'd want to have an incoherent betting system because that would allow for win-wins. If I remember correctly you said that there was some paper discussing that? If so, do you remember what it was? (That does sound very unlikely to exist so it's very likely I'm simply misremembering or misunderstood).
The presented retrospective evaluation of the Bayesian monk prediction might be fine to show that calibration is not enough, but is otherwise not very helpful. Neither for understanding the problem of assigning probabilities to single non-repeatable events, nor for understanding the subjective Bayesian approach to that problem. I'm not a Bayesian, but I am a huge fan of both David Spiegelhalter and David Mermin. See here (https://www.physicsforums.com/threads/qm-eigenstates-and-the-notion-of-motion.1050354/post-6862464) for what I wrote in a related discussion:
"It is not at all clear what a 30% probability for rain tomorrow means. ... How can you ... decide whose predictions are better or worse? ...
"The Art of Statistics" by David Spiegelhalter is a nice book, which also gives an overview about how this type of problem gets approached in practice. One nice way is to compute their Brier score. I find this nice, because it is about the simplest score imaginable, and it has the nice property that predictions of 0% or 100% probability are not special in any way. This reminds me on David Mermin's QBist contribution, that prediction with 0% or 100% probability in quantum mechanics are still personal judgments, and don't give any more objectivity than any other prediction."
You know exactly what I’d say: Probability 1 and Probability 2 are both called “probability” because they obey Kolmogorov’s axioms. But the axioms won’t tell you how to bridge these concepts, that’s a pragmatic, problem-dependent affair.
Degrees of belief obey Kolmogorov's axioms if you assume they obey countable additivity (de Finetti wanted finite only), are sharply defined (imprecise probability is an alternative), and are real-valued (hyperreal credences are an alternative).
I agree, but these assumptions are doing a lot of work!
There’s a way to reconcile de Finetti coherence with Kolmogorov’s axioms, as shown by Vivek Borkar and Sanjoy Mitter (https://www.mit.edu/~mitter/publications/102_ondefinetti_elsev.pdf). As for hyperreals, I’m just a humble electrical engineer from a backwoods Midwest university.
Thank you for pointing out the Borkar and Mitter paper! And I'm sorry for my initial kneejerk pushback.
No worries, that’s what comment sections are for!
Or/also if you equate propositions with measurable sets in the sample space.
Which is exactly what I’m advocating here: https://realizable.substack.com/p/probabilities-coherence-correspondence
I just always find it weird that Probability 1 only obeys the Kolmogorov axioms if you agree to Dutch Bookery or Cox Axioms. It's always a jump to force them to match P2.
More important is what is your hottest take on Megadeth?
Bayes sells … but who’s buying?
Not all instances of Probability 1 require Dutch-booking — e.g., I don’t need it to set up a probability space with a Brownian motion defined on it. It’s a Probability 1 construct, but no betting is involved.
I know this is sliippery, but Isn't brownian motion probability 2? It's object linguistic. The descriptive model is probability 2, predictive probability 1? What do you think?
The way I see it, the probability model of the ideal, mathematical Brownian motion is metalinguistic. I map Probability 2 to Slepian’s Facet A and Probability 1 to his Facet B: https://realizable.substack.com/p/the-two-facets-of-david-slepian.
I'm not disagreeing, but in this conception doesn't that mean that the probability of a coin flip also starts off as probability 1 and is mapped to probability 2?
The proportions of heads and tails in a sequence of flips is Probability 2.
Rust in p's.
I don't think calibration is trivial, and think its a good frequentist perspective with which to think about probability. Its true that on Ricky's example, marginal calibration is entirely uninformative. But if the sequence of predictions is calibrated not just marginally, but also on the subsequence of odd indices and the subsequence of even indices, then they must be perfect. As you start asking for calibration on more subsequences/conditional on more features of the data, you capture more structure. Already in 1985, Dawid thought about the consequences of asking for calibration on different subsequences, and using this as a foundation for probability: https://projecteuclid.org/journals/annals-of-statistics/volume-13/issue-4/Calibration-Based-Empirical-Probability/10.1214/aos/1176349736.full Among other things, two computable sequences of forecasts that are calibrated on every computable subsequence must agree almost everywhere (everywhere except on a finite sequence of predictions), and thus so long as any "true data generating process" turns out to be computable, computably calibrated forecasts must recover it. And another one of Ricky's results (with Smorodinsky and Sandroni) --- that can also be derived from Blackwell Approachability --- is that there is a computable algorithm that makes forecasts that are guaranteed to be calibrated on any countably infinite collection of subsequences: https://pubsonline.informs.org/doi/abs/10.1287/moor.28.1.141.14264?casa_token=EPfn-cDedJUAAAAA:UV1W7dx73P2LsJKZTfD8h3sLPiRrpZj5hrNjVXrpG_BVe_ng7ETUE6RnYgBm9K8fMV32G37lMg so Dawid's vision is sort of realizable (at least in the infinite data/compute limit :-)
Of course talking about computability doesn't give you much insight into practical prediction methods, but the recent literature on multicalibration revisits these ideas with a computational/finite-data statistical lens. In general computationally efficient variants of multicalibration can't promise that they recover "truth" in the worst case, but you can ask what you've learnt if you produce predictions that are calibrated conditional on different kinds of structure. You can e.g. view calibration as a boosting procedure, and then ask where you need to be calibrated to recover the structure in the data: https://arxiv.org/abs/2301.13767 or ask what guarantees you get in the worst case given that you are calibrated with respect to a certain family of functions/subsequences: https://arxiv.org/abs/2109.05389
I agree that frequency seems to be basic notion behind probability for technical purposes. However, I wonder if belief can become important when it comes to individuals actually using probability in their own life, for example as a way to update P(intervention works on me) as we try something new.
Of course, one option that reduces to frequency in this case is something like your N-of-1 paper. But running such an algorithm on *all* interventions (e.g., new sports regimen, a diet tweak, etc.) seems a bit overkill? I think the 'belief' interpretation can offer a more relaxed formalism that can be more intuitive to use across a wider range of applications than just new medications.
For instance, the bayesian view suggests initializing P_0(intervention works on me) = historical frequency of success of intervention in a population (or relevant subpopulation) and then updating according to Bayes rule: $P_{t+1}(intervention works on me) = \frac{P(effect on day t| intervention works on me) * P_t(intervention works on me)}{P(effect on day t| intervention works on me)*P_t(intervention works on me) + P(effect on day t| intervention doesn't work on me)*(1-P_t(intervention works on me))}$. In my opinion the appeal is that the bayesian framework only needs some specification of P(effect on day t| intervention works on me) and P(effect on day t| intervention doesn't work on me) with really minimal requirements other than that they are non-negative and sum to less than 1. And it seems like people could at least intuit some rough lower/upper bounds on this (+ with requesting upper bounds there isn't even a need for the summing to less than 1 constraint).
Also, one slightly off-topic question: in the lecture on probability in our class I once jokingly said that maybe we'd want to have an incoherent betting system because that would allow for win-wins. If I remember correctly you said that there was some paper discussing that? If so, do you remember what it was? (That does sound very unlikely to exist so it's very likely I'm simply misremembering or misunderstood).
The presented retrospective evaluation of the Bayesian monk prediction might be fine to show that calibration is not enough, but is otherwise not very helpful. Neither for understanding the problem of assigning probabilities to single non-repeatable events, nor for understanding the subjective Bayesian approach to that problem. I'm not a Bayesian, but I am a huge fan of both David Spiegelhalter and David Mermin. See here (https://www.physicsforums.com/threads/qm-eigenstates-and-the-notion-of-motion.1050354/post-6862464) for what I wrote in a related discussion:
"It is not at all clear what a 30% probability for rain tomorrow means. ... How can you ... decide whose predictions are better or worse? ...
"The Art of Statistics" by David Spiegelhalter is a nice book, which also gives an overview about how this type of problem gets approached in practice. One nice way is to compute their Brier score. I find this nice, because it is about the simplest score imaginable, and it has the nice property that predictions of 0% or 100% probability are not special in any way. This reminds me on David Mermin's QBist contribution, that prediction with 0% or 100% probability in quantum mechanics are still personal judgments, and don't give any more objectivity than any other prediction."