I want to take a brief break from my 1920s time machine and consider whether we’ve progressed much in our present statistical thinking. Let’s do a thought experiment to find out. I present you with a deck of 1,000 ordinary playing cards and promise it has been expertly shuffled. Beyond these facts, I don’t know how the deck was assembled. You draw the first twenty cards and see five red cards and fifteen black cards. How many red cards do you think were in total in the deck?
I asked my fifth-grade son this question at the breakfast table yesterday. He immediately answered 250. I asked why and he responded that one quarter of the cards were red, so probably one quarter of the cards in the deck were red. I asked him why he could make that inference. He told me I gave him limited information, and that’s the best guess he could make with what I told him. He said “it’s the best estimate.”
Which parts of inferential statistics are more rigorous than this? How would we answer today if we wanted to be maximally statistical about it? Let me try to be a statistician for a minute (I acknowledge I’m not a good one) and try to construct some good faith arguments. And statisticians, please correct me where you see flaws in my methods.
I can’t tell you exactly how many red cards are in the deck. But given what I know, I think it’s reasonable to assume that the card draw is a sample from a hypergeometric distribution. I have a population of size 1,000. There is an unknown number of red cards in the deck, call this number H. I have done 20 draws and observed 5 red cards. So the likelihood model should be
P[ 5 red cards | H ] = hypergeometric(1000,H,20)
I think we can all more or less agree this is a reasonable model for generating the hand I’ve described. Now what is H? The maximum likelihood estimate is certainly 250. You can check this numerically, prove it for the hypergeometric distribution, or go with my kid’s intuition. So far so good.
But then you might say, “Well, come now, that’s just an estimate. That estimate probably has variability. You should give me an estimate of how uncertain you are.” But in order to do this, how can I capture the variability of the potential draws?
I could report a confidence interval or a credible interval. Oh, here I go again. I find it interesting that the Frequentist and Bayesian answers here are basically the same and both are pretty useless. The 95% confidence interval is [120,488] and the 95% credible interval is [114,469]. People will fight to the death over the right one to use. Bayesians and Frequentists agree about their likelihoods but argue about what to do with them, even if the answers are effectively indistinguishable. More valuable to sickos like me, the 5-9 confidence/credible intervals (99.999% coverage) are [25,740] and [24,720]. Given the information we have, intervals only tell us we should draw more cards.
The problem gets weirder when I change the size of the deck and make the problem feel more like machine learning. What if instead of 1,000 cards, we had only 21? We draw 20 cards and see 5 red cards. What is the color of the last card? At this point, a confidence interval doesn’t make much sense. It’s either red or black. I suppose I could ask, what is the “probability” that the card is red? But what does that mean?
If I am Bayesian and am willing to state a uniform prior, I can answer this question. Using my hypergeometric likelihood and Bayes Rule, I’ll compute the probability the card is red to be 3/11.1 This is a bit higher than one quarter because the estimated probability is shrinking towards the prior. But 25% and 27% are the same for most people.
Frequentists can’t answer my question. They are content to compute the likelihoods and let you do with them as you please. If the shuffling was random, and there were six red hearts in the deck, the probability I would have seen five of them is 0.29. If there were five cards, the probability I would have seen all five is 0.76. These numbers don’t add up to one, but they are telling me something.
Hmm. I’m not sure if you feel happy with these answers so far. And I’ve got a worse question, harkening back to yesterday’s discussion of random sampling. Why do we believe the likelihoods? My ruse is describing the situation in a way that aligns with textbook motivations for random sampling. But if I was gambling or investing, I’d want more evidence that the cards were well shuffled, requiring me to inspect the shuffling machine several times and convince myself the statistics were reasonable. I’d want access to the cards themselves, which might shape my “priors” if you will. Likelihoods require a lot of superstition. But without such access and with only the information given, why not take Isaac’s solution?
Why is the probability of 6 in Bayesian land equal to 6/22? I’m sure there’s a cute way to prove this using combinatorics, but I can’t see it at this hour in the morning. Does anyone care to offer a clever proof?
Statistics gives us a framework of working with uncertainty in reality. The origin of Gaussian distribution is that Gauss wants to measure the error from celestial objects. In that context, we already have a way to observe the reality---celestial movement can be verified. Unfortunately, many other phenomenon where probability distribution is assumed can not be repeated/verified. The truth might be that statisticians can not solve that problem. Introducing differential, asymptotic analysis, measure theory can not solve this fundamental problem. It can even make problem worse by introducing more assumptions. We can not pretend that we know what we can't know by using sigma-algebra...
My point is just that we know that the standard deviation will decrease like 1/sqrt{n} so we can use this to help guide how we allocate samples going forward. If the first and third decks empirical red proportions are separated by multiple stds at some point, then we could stop sample cards from deck 1 and focus just on decks 2 and 3. So, this is some justification for studying the standard deviation or confidence intervals of estimates.