My point is just that we know that the standard deviation will decrease like 1/sqrt{n} so we can use this to help guide how we allocate samples going forward. If the first and third decks empirical red proportions are separated by multiple stds at some point, then we could stop sample cards from deck 1 and focus just on decks 2 and 3. So, this is some justification for studying the standard deviation or confidence intervals of estimates.
I agree with the post. Things get more interesting when you have multiple hypotheses. Suppose you have three very large decks of cards. You want to determine which deck has the highest proportion of reds. You draw 100 cards from each deck and see 10, 20, 50 reds, respectively. Now you are allowed to draw more cards from each deck, but you have a budget of 300 total. How should you allocate your budget?
My first guesstimate, about what could be a Trick Question regarding allocating the budget of 300 new card draws, would be that the ULTIMATE sizes of the three new samples from the three "very large card universes" should be inversely proportional to the 10/20/50 red counts that we have already have in hand. How about 120 extra for the 10 pile to give us 130 total cards, 110 for the 20 pile to give 130 total, and 70 for the third pile to give 120 total?
I'm no card-carrying Bayesian, but I must say that the *brilliant* 2021 text of Aubrey Clayton, "Bernoulli's Fallacy -- Statistical Illogic and the Crisis of Modern Science", should be on the book shelves of all persons reading this interesting post. I would argue that it's one-half of maybe THE most important reading assignment every serious thinker should tackle (the other half being Deb Mayo's 1996 text, "Error and the Growth of Experimental Knowledge").
Maybe I should reread, but I found Clayton's fixation on the Frequentist vs Bayesian arguments caused him to not see that everyone is wrong about probability. That, to me, is a harder but more necessary case to articulate. Mathematical probability is rigorous, but there isn't a "right way" to apply it to reality in all cases. I'll keep working on fleshing out this view here.
But I haven't read that particular book by Mayo. Added it to my list.
Deborah Mayo is a definite genius and there are two books from her subsequent to the one mentioned. They are each exquisitely written in my opinion. She has a busy website that is a good "meeting place" for folks pumped up about the niche (?) field of Philosophy of Science. I agree that Professor Clayton seemed a wee bit too exercised in places about F vs. B wrangling that is now, what, at least 100 years old? However, I am not actually credentialed to make a truly authoritative evaluation (PhD organic chemistry and then MD and Board Certified General Surgery). As my sister told me, "Oh, you are *just* a general surgeon".
Your sister is mean! I'd also argue that these probabilistic arguments are too important to defer to statisticians. Most certainly don't know more metaphysics than a fifth grader.
And I agree with your assessment of Mayo. Her blog is indeed full of great discussions.
Why confidence intervals make no sense in the case of 21 cards? You can give a confidence interval for the total number of red cards, for example this basic one works: Let k be the number of red cards in the first 20. If k < 10, then the "interval" is {k}. If k=10, then {10, 11}. If k > 10, then {k + 1}. The probability that the number of red cards is inside this interval is minimized when there are 10 or 11 red balls in total, and you have 10/21 probability of failure. For other cases it is much better, for example when there are 5 red balls in total, then probability of predicting the correct number of balls is 16/21, and when there are 6 red balls in total, it is 15/21.
One interesting remark: If you want your confidence interval to consist of only a single number, i.e. you want to do a prediction, and if you also want to have a coverage probability of at least %50 for all possible choice of parameters, then no matter what comes up you need to flip a fair coin and say one of the two consistent outcomes. There is no better one than this.
Another fun post! "How many red cards do you think were in total in the deck?"
And then, "Beyond these facts, _I don’t know_ how the deck was assembled." I think we can question this "IDK". You say "we can all more or less agree this is a reasonable model". So, it's a thought experiment where we try to check what our reasonable intuitions are. My intuition is that there are two possibilities. Either the deck is approximately 50%-50% (e.g., we found a big pile of cards at the cottage and my cousin "expertly shuffled it") either it's not (e.g., my statistician friend is having fun with me; last year my cousin removed half of all cards of one color to make children's art and craft). I guess my reasoning is Bayesian. To me, personally, I find the first possibility quite plausible. I'm tempted to at least consider the possibility that at first the pile was approximately 500-500. If that's true, I know that 5-15 is unlikely. Without calculations I wouldn't know how unlikely but I'd know it's somewhat unlikely (apparently ~ 4% for <6 of either color). In real life, I think it's very reasonable to put a big prior on 500-500. So, either around 500 or around 250. I think of this "thought experiment" linguistically and socially before thinking about the maths. In other words, are the cards drawn from a a hypergeometric distribution, or from a normal pile of 30 old decks at the cottage. Now that I think of it, having 1000 cards at the cottage is unlikely; maybe at the community center where they play bridge.
My point is just that we know that the standard deviation will decrease like 1/sqrt{n} so we can use this to help guide how we allocate samples going forward. If the first and third decks empirical red proportions are separated by multiple stds at some point, then we could stop sample cards from deck 1 and focus just on decks 2 and 3. So, this is some justification for studying the standard deviation or confidence intervals of estimates.
Yes, the notion that we *can* randomly sample does a lot of work there, but this is a good argument for intervals.
For your question in the appendix see Laplace's rule of succession:
https://en.m.wikipedia.org/wiki/Rule_of_succession
I agree with the post. Things get more interesting when you have multiple hypotheses. Suppose you have three very large decks of cards. You want to determine which deck has the highest proportion of reds. You draw 100 cards from each deck and see 10, 20, 50 reds, respectively. Now you are allowed to draw more cards from each deck, but you have a budget of 300 total. How should you allocate your budget?
I still need more information before I can answer your question, no?
My first guesstimate, about what could be a Trick Question regarding allocating the budget of 300 new card draws, would be that the ULTIMATE sizes of the three new samples from the three "very large card universes" should be inversely proportional to the 10/20/50 red counts that we have already have in hand. How about 120 extra for the 10 pile to give us 130 total cards, 110 for the 20 pile to give 130 total, and 70 for the third pile to give 120 total?
I'm no card-carrying Bayesian, but I must say that the *brilliant* 2021 text of Aubrey Clayton, "Bernoulli's Fallacy -- Statistical Illogic and the Crisis of Modern Science", should be on the book shelves of all persons reading this interesting post. I would argue that it's one-half of maybe THE most important reading assignment every serious thinker should tackle (the other half being Deb Mayo's 1996 text, "Error and the Growth of Experimental Knowledge").
Maybe I should reread, but I found Clayton's fixation on the Frequentist vs Bayesian arguments caused him to not see that everyone is wrong about probability. That, to me, is a harder but more necessary case to articulate. Mathematical probability is rigorous, but there isn't a "right way" to apply it to reality in all cases. I'll keep working on fleshing out this view here.
But I haven't read that particular book by Mayo. Added it to my list.
Deborah Mayo is a definite genius and there are two books from her subsequent to the one mentioned. They are each exquisitely written in my opinion. She has a busy website that is a good "meeting place" for folks pumped up about the niche (?) field of Philosophy of Science. I agree that Professor Clayton seemed a wee bit too exercised in places about F vs. B wrangling that is now, what, at least 100 years old? However, I am not actually credentialed to make a truly authoritative evaluation (PhD organic chemistry and then MD and Board Certified General Surgery). As my sister told me, "Oh, you are *just* a general surgeon".
Your sister is mean! I'd also argue that these probabilistic arguments are too important to defer to statisticians. Most certainly don't know more metaphysics than a fifth grader.
And I agree with your assessment of Mayo. Her blog is indeed full of great discussions.
Why confidence intervals make no sense in the case of 21 cards? You can give a confidence interval for the total number of red cards, for example this basic one works: Let k be the number of red cards in the first 20. If k < 10, then the "interval" is {k}. If k=10, then {10, 11}. If k > 10, then {k + 1}. The probability that the number of red cards is inside this interval is minimized when there are 10 or 11 red balls in total, and you have 10/21 probability of failure. For other cases it is much better, for example when there are 5 red balls in total, then probability of predicting the correct number of balls is 16/21, and when there are 6 red balls in total, it is 15/21.
One interesting remark: If you want your confidence interval to consist of only a single number, i.e. you want to do a prediction, and if you also want to have a coverage probability of at least %50 for all possible choice of parameters, then no matter what comes up you need to flip a fair coin and say one of the two consistent outcomes. There is no better one than this.
Another fun post! "How many red cards do you think were in total in the deck?"
And then, "Beyond these facts, _I don’t know_ how the deck was assembled." I think we can question this "IDK". You say "we can all more or less agree this is a reasonable model". So, it's a thought experiment where we try to check what our reasonable intuitions are. My intuition is that there are two possibilities. Either the deck is approximately 50%-50% (e.g., we found a big pile of cards at the cottage and my cousin "expertly shuffled it") either it's not (e.g., my statistician friend is having fun with me; last year my cousin removed half of all cards of one color to make children's art and craft). I guess my reasoning is Bayesian. To me, personally, I find the first possibility quite plausible. I'm tempted to at least consider the possibility that at first the pile was approximately 500-500. If that's true, I know that 5-15 is unlikely. Without calculations I wouldn't know how unlikely but I'd know it's somewhat unlikely (apparently ~ 4% for <6 of either color). In real life, I think it's very reasonable to put a big prior on 500-500. So, either around 500 or around 250. I think of this "thought experiment" linguistically and socially before thinking about the maths. In other words, are the cards drawn from a a hypergeometric distribution, or from a normal pile of 30 old decks at the cottage. Now that I think of it, having 1000 cards at the cottage is unlikely; maybe at the community center where they play bridge.