Statistics gives us a framework of working with uncertainty in reality. The origin of Gaussian distribution is that Gauss wants to measure the error from celestial objects. In that context, we already have a way to observe the reality---celestial movement can be verified. Unfortunately, many other phenomenon where probability distribution is assumed can not be repeated/verified. The truth might be that statisticians can not solve that problem. Introducing differential, asymptotic analysis, measure theory can not solve this fundamental problem. It can even make problem worse by introducing more assumptions. We can not pretend that we know what we can't know by using sigma-algebra...

My point is just that we know that the standard deviation will decrease like 1/sqrt{n} so we can use this to help guide how we allocate samples going forward. If the first and third decks empirical red proportions are separated by multiple stds at some point, then we could stop sample cards from deck 1 and focus just on decks 2 and 3. So, this is some justification for studying the standard deviation or confidence intervals of estimates.

This kind of “thought experiment” is used to support many statistical arguments, which oversimplifies the problem. This kind of decks argument are applied to estimate the population of animal, which is fine. But don’t forget that most animals are not decks of cards —they are moving, born and dying. They are also not celestial objects whose trajectory can be predicted through some Newtonian law. And outbreak of disease could destroy the power of interval. Variance and mean give us little information of a dynamical system even if we assume we know the pdf formula.

I suggest another kind of thought experiments as an educational supplement. One day, God gives his power to a statistician. The power is not as awesome as god who knows the real law- explicit form of all functions(including random variable) whose measurable spaces contain all of multi-universe.

With such power, the statistician knows pdfs of all quantities in the Universe. One day a student comes to the statistician with three fairies. Those fairies can change into two color: red and non-red. In 10 minutes, the statistician notices :

fairy 1 has been red for 1 minutes.

fairy 2 has been red for 2 minutes.

fairy 3 has been red for 5 minutes.

The statistician uses his super pdf power to explain, which fairy likes to display red most and the frequentist interpretation of the confidence interval, to the student.

“Sir, I am confused. Why don’t you just ask them ?” Student replied.

I agree with the post. Things get more interesting when you have multiple hypotheses. Suppose you have three very large decks of cards. You want to determine which deck has the highest proportion of reds. You draw 100 cards from each deck and see 10, 20, 50 reds, respectively. Now you are allowed to draw more cards from each deck, but you have a budget of 300 total. How should you allocate your budget?

My first guesstimate, about what could be a Trick Question regarding allocating the budget of 300 new card draws, would be that the ULTIMATE sizes of the three new samples from the three "very large card universes" should be inversely proportional to the 10/20/50 red counts that we have already have in hand. How about 120 extra for the 10 pile to give us 130 total cards, 110 for the 20 pile to give 130 total, and 70 for the third pile to give 120 total?

I'm no card-carrying Bayesian, but I must say that the *brilliant* 2021 text of Aubrey Clayton, "Bernoulli's Fallacy -- Statistical Illogic and the Crisis of Modern Science", should be on the book shelves of all persons reading this interesting post. I would argue that it's one-half of maybe THE most important reading assignment every serious thinker should tackle (the other half being Deb Mayo's 1996 text, "Error and the Growth of Experimental Knowledge").

Maybe I should reread, but I found Clayton's fixation on the Frequentist vs Bayesian arguments caused him to not see that everyone is wrong about probability. That, to me, is a harder but more necessary case to articulate. Mathematical probability is rigorous, but there isn't a "right way" to apply it to reality in all cases. I'll keep working on fleshing out this view here.

But I haven't read that particular book by Mayo. Added it to my list.

Deborah Mayo is a definite genius and there are two books from her subsequent to the one mentioned. They are each exquisitely written in my opinion. She has a busy website that is a good "meeting place" for folks pumped up about the niche (?) field of Philosophy of Science. I agree that Professor Clayton seemed a wee bit too exercised in places about F vs. B wrangling that is now, what, at least 100 years old? However, I am not actually credentialed to make a truly authoritative evaluation (PhD organic chemistry and then MD and Board Certified General Surgery). As my sister told me, "Oh, you are *just* a general surgeon".

Your sister is mean! I'd also argue that these probabilistic arguments are too important to defer to statisticians. Most certainly don't know more metaphysics than a fifth grader.

And I agree with your assessment of Mayo. Her blog is indeed full of great discussions.

Why confidence intervals make no sense in the case of 21 cards? You can give a confidence interval for the total number of red cards, for example this basic one works: Let k be the number of red cards in the first 20. If k < 10, then the "interval" is {k}. If k=10, then {10, 11}. If k > 10, then {k + 1}. The probability that the number of red cards is inside this interval is minimized when there are 10 or 11 red balls in total, and you have 10/21 probability of failure. For other cases it is much better, for example when there are 5 red balls in total, then probability of predicting the correct number of balls is 16/21, and when there are 6 red balls in total, it is 15/21.

One interesting remark: If you want your confidence interval to consist of only a single number, i.e. you want to do a prediction, and if you also want to have a coverage probability of at least %50 for all possible choice of parameters, then no matter what comes up you need to flip a fair coin and say one of the two consistent outcomes. There is no better one than this.

Another fun post! "How many red cards do you think were in total in the deck?"

And then, "Beyond these facts, _I don’t know_ how the deck was assembled." I think we can question this "IDK". You say "we can all more or less agree this is a reasonable model". So, it's a thought experiment where we try to check what our reasonable intuitions are. My intuition is that there are two possibilities. Either the deck is approximately 50%-50% (e.g., we found a big pile of cards at the cottage and my cousin "expertly shuffled it") either it's not (e.g., my statistician friend is having fun with me; last year my cousin removed half of all cards of one color to make children's art and craft). I guess my reasoning is Bayesian. To me, personally, I find the first possibility quite plausible. I'm tempted to at least consider the possibility that at first the pile was approximately 500-500. If that's true, I know that 5-15 is unlikely. Without calculations I wouldn't know how unlikely but I'd know it's somewhat unlikely (apparently ~ 4% for <6 of either color). In real life, I think it's very reasonable to put a big prior on 500-500. So, either around 500 or around 250. I think of this "thought experiment" linguistically and socially before thinking about the maths. In other words, are the cards drawn from a a hypergeometric distribution, or from a normal pile of 30 old decks at the cottage. Now that I think of it, having 1000 cards at the cottage is unlikely; maybe at the community center where they play bridge.

Statistics gives us a framework of working with uncertainty in reality. The origin of Gaussian distribution is that Gauss wants to measure the error from celestial objects. In that context, we already have a way to observe the reality---celestial movement can be verified. Unfortunately, many other phenomenon where probability distribution is assumed can not be repeated/verified. The truth might be that statisticians can not solve that problem. Introducing differential, asymptotic analysis, measure theory can not solve this fundamental problem. It can even make problem worse by introducing more assumptions. We can not pretend that we know what we can't know by using sigma-algebra...

My point is just that we know that the standard deviation will decrease like 1/sqrt{n} so we can use this to help guide how we allocate samples going forward. If the first and third decks empirical red proportions are separated by multiple stds at some point, then we could stop sample cards from deck 1 and focus just on decks 2 and 3. So, this is some justification for studying the standard deviation or confidence intervals of estimates.

Yes, the notion that we *can* randomly sample does a lot of work there, but this is a good argument for intervals.

This kind of “thought experiment” is used to support many statistical arguments, which oversimplifies the problem. This kind of decks argument are applied to estimate the population of animal, which is fine. But don’t forget that most animals are not decks of cards —they are moving, born and dying. They are also not celestial objects whose trajectory can be predicted through some Newtonian law. And outbreak of disease could destroy the power of interval. Variance and mean give us little information of a dynamical system even if we assume we know the pdf formula.

I suggest another kind of thought experiments as an educational supplement. One day, God gives his power to a statistician. The power is not as awesome as god who knows the real law- explicit form of all functions(including random variable) whose measurable spaces contain all of multi-universe.

With such power, the statistician knows pdfs of all quantities in the Universe. One day a student comes to the statistician with three fairies. Those fairies can change into two color: red and non-red. In 10 minutes, the statistician notices :

fairy 1 has been red for 1 minutes.

fairy 2 has been red for 2 minutes.

fairy 3 has been red for 5 minutes.

The statistician uses his super pdf power to explain, which fairy likes to display red most and the frequentist interpretation of the confidence interval, to the student.

“Sir, I am confused. Why don’t you just ask them ?” Student replied.

For your question in the appendix see Laplace's rule of succession:

https://en.m.wikipedia.org/wiki/Rule_of_succession

I agree with the post. Things get more interesting when you have multiple hypotheses. Suppose you have three very large decks of cards. You want to determine which deck has the highest proportion of reds. You draw 100 cards from each deck and see 10, 20, 50 reds, respectively. Now you are allowed to draw more cards from each deck, but you have a budget of 300 total. How should you allocate your budget?

I still need more information before I can answer your question, no?

edited Mar 28My first guesstimate, about what could be a Trick Question regarding allocating the budget of 300 new card draws, would be that the ULTIMATE sizes of the three new samples from the three "very large card universes" should be inversely proportional to the 10/20/50 red counts that we have already have in hand. How about 120 extra for the 10 pile to give us 130 total cards, 110 for the 20 pile to give 130 total, and 70 for the third pile to give 120 total?

edited Mar 28I'm no card-carrying Bayesian, but I must say that the *brilliant* 2021 text of Aubrey Clayton, "Bernoulli's Fallacy -- Statistical Illogic and the Crisis of Modern Science", should be on the book shelves of all persons reading this interesting post. I would argue that it's one-half of maybe THE most important reading assignment every serious thinker should tackle (the other half being Deb Mayo's 1996 text, "Error and the Growth of Experimental Knowledge").

Maybe I should reread, but I found Clayton's fixation on the Frequentist vs Bayesian arguments caused him to not see that everyone is wrong about probability. That, to me, is a harder but more necessary case to articulate. Mathematical probability is rigorous, but there isn't a "right way" to apply it to reality in all cases. I'll keep working on fleshing out this view here.

But I haven't read that particular book by Mayo. Added it to my list.

Deborah Mayo is a definite genius and there are two books from her subsequent to the one mentioned. They are each exquisitely written in my opinion. She has a busy website that is a good "meeting place" for folks pumped up about the niche (?) field of Philosophy of Science. I agree that Professor Clayton seemed a wee bit too exercised in places about F vs. B wrangling that is now, what, at least 100 years old? However, I am not actually credentialed to make a truly authoritative evaluation (PhD organic chemistry and then MD and Board Certified General Surgery). As my sister told me, "Oh, you are *just* a general surgeon".

Your sister is mean! I'd also argue that these probabilistic arguments are too important to defer to statisticians. Most certainly don't know more metaphysics than a fifth grader.

And I agree with your assessment of Mayo. Her blog is indeed full of great discussions.

Why confidence intervals make no sense in the case of 21 cards? You can give a confidence interval for the total number of red cards, for example this basic one works: Let k be the number of red cards in the first 20. If k < 10, then the "interval" is {k}. If k=10, then {10, 11}. If k > 10, then {k + 1}. The probability that the number of red cards is inside this interval is minimized when there are 10 or 11 red balls in total, and you have 10/21 probability of failure. For other cases it is much better, for example when there are 5 red balls in total, then probability of predicting the correct number of balls is 16/21, and when there are 6 red balls in total, it is 15/21.

One interesting remark: If you want your confidence interval to consist of only a single number, i.e. you want to do a prediction, and if you also want to have a coverage probability of at least %50 for all possible choice of parameters, then no matter what comes up you need to flip a fair coin and say one of the two consistent outcomes. There is no better one than this.

Another fun post! "How many red cards do you think were in total in the deck?"

And then, "Beyond these facts, _I don’t know_ how the deck was assembled." I think we can question this "IDK". You say "we can all more or less agree this is a reasonable model". So, it's a thought experiment where we try to check what our reasonable intuitions are. My intuition is that there are two possibilities. Either the deck is approximately 50%-50% (e.g., we found a big pile of cards at the cottage and my cousin "expertly shuffled it") either it's not (e.g., my statistician friend is having fun with me; last year my cousin removed half of all cards of one color to make children's art and craft). I guess my reasoning is Bayesian. To me, personally, I find the first possibility quite plausible. I'm tempted to at least consider the possibility that at first the pile was approximately 500-500. If that's true, I know that 5-15 is unlikely. Without calculations I wouldn't know how unlikely but I'd know it's somewhat unlikely (apparently ~ 4% for <6 of either color). In real life, I think it's very reasonable to put a big prior on 500-500. So, either around 500 or around 250. I think of this "thought experiment" linguistically and socially before thinking about the maths. In other words, are the cards drawn from a a hypergeometric distribution, or from a normal pile of 30 old decks at the cottage. Now that I think of it, having 1000 cards at the cottage is unlikely; maybe at the community center where they play bridge.