14 Comments

Statistics gives us a framework of working with uncertainty in reality. The origin of Gaussian distribution is that Gauss wants to measure the error from celestial objects. In that context, we already have a way to observe the reality---celestial movement can be verified. Unfortunately, many other phenomenon where probability distribution is assumed can not be repeated/verified. The truth might be that statisticians can not solve that problem. Introducing differential, asymptotic analysis, measure theory can not solve this fundamental problem. It can even make problem worse by introducing more assumptions. We can not pretend that we know what we can't know by using sigma-algebra...

Expand full comment
Mar 28Liked by Ben Recht

My point is just that we know that the standard deviation will decrease like 1/sqrt{n} so we can use this to help guide how we allocate samples going forward. If the first and third decks empirical red proportions are separated by multiple stds at some point, then we could stop sample cards from deck 1 and focus just on decks 2 and 3. So, this is some justification for studying the standard deviation or confidence intervals of estimates.

Expand full comment
Apr 25Liked by Ben Recht

For your question in the appendix see Laplace's rule of succession:

https://en.m.wikipedia.org/wiki/Rule_of_succession

Expand full comment
Mar 28Liked by Ben Recht

I agree with the post. Things get more interesting when you have multiple hypotheses. Suppose you have three very large decks of cards. You want to determine which deck has the highest proportion of reds. You draw 100 cards from each deck and see 10, 20, 50 reds, respectively. Now you are allowed to draw more cards from each deck, but you have a budget of 300 total. How should you allocate your budget?

Expand full comment
Mar 28·edited Mar 28Liked by Ben Recht

I'm no card-carrying Bayesian, but I must say that the *brilliant* 2021 text of Aubrey Clayton, "Bernoulli's Fallacy -- Statistical Illogic and the Crisis of Modern Science", should be on the book shelves of all persons reading this interesting post. I would argue that it's one-half of maybe THE most important reading assignment every serious thinker should tackle (the other half being Deb Mayo's 1996 text, "Error and the Growth of Experimental Knowledge").

Expand full comment

Why confidence intervals make no sense in the case of 21 cards? You can give a confidence interval for the total number of red cards, for example this basic one works: Let k be the number of red cards in the first 20. If k < 10, then the "interval" is {k}. If k=10, then {10, 11}. If k > 10, then {k + 1}. The probability that the number of red cards is inside this interval is minimized when there are 10 or 11 red balls in total, and you have 10/21 probability of failure. For other cases it is much better, for example when there are 5 red balls in total, then probability of predicting the correct number of balls is 16/21, and when there are 6 red balls in total, it is 15/21.

One interesting remark: If you want your confidence interval to consist of only a single number, i.e. you want to do a prediction, and if you also want to have a coverage probability of at least %50 for all possible choice of parameters, then no matter what comes up you need to flip a fair coin and say one of the two consistent outcomes. There is no better one than this.

Expand full comment

Another fun post! "How many red cards do you think were in total in the deck?"

And then, "Beyond these facts, _I don’t know_ how the deck was assembled." I think we can question this "IDK". You say "we can all more or less agree this is a reasonable model". So, it's a thought experiment where we try to check what our reasonable intuitions are. My intuition is that there are two possibilities. Either the deck is approximately 50%-50% (e.g., we found a big pile of cards at the cottage and my cousin "expertly shuffled it") either it's not (e.g., my statistician friend is having fun with me; last year my cousin removed half of all cards of one color to make children's art and craft). I guess my reasoning is Bayesian. To me, personally, I find the first possibility quite plausible. I'm tempted to at least consider the possibility that at first the pile was approximately 500-500. If that's true, I know that 5-15 is unlikely. Without calculations I wouldn't know how unlikely but I'd know it's somewhat unlikely (apparently ~ 4% for <6 of either color). In real life, I think it's very reasonable to put a big prior on 500-500. So, either around 500 or around 250. I think of this "thought experiment" linguistically and socially before thinking about the maths. In other words, are the cards drawn from a a hypergeometric distribution, or from a normal pile of 30 old decks at the cottage. Now that I think of it, having 1000 cards at the cottage is unlikely; maybe at the community center where they play bridge.

Expand full comment