Discussion about this post

User's avatar
rif a saurous's avatar

This is a meaty post that might benefit from expansion into multiples.

In some sense, I think there is a conflation between, on the one hand fundamental epistemological problems that come from wanting things we can't have, and on the other hand, with "bad statistics" problems which are (somewhat) ameliorated with (for example) a more Bayesian approach?

I think the basic epistemological nightmare is that even in the best case, detecting small differences is hard and can only be done with some combination of enormous sample sizes and substantial probabilities of failure. For instance, the power calculation you've done is (abstractly) about detecting the difference between a coin that comes up heads one percent of the time and a coin that comes up heads half a percent of the time. It's not surprising you'd need thousands of samples to do that with a moderate chance of success!

If you're a HFT hedge fund and you get to take gazillions of identicalish bets, this is fine --- you're delighted with high-variance expected positive value bets. But if you're trying to approve a vaccine or a drug or intervention, hoo boy. If the overall population effect is small (either because the intervention only does a little or it only does it for a small fraction of the population), you're basically out of luck, even before you bring all the problems related to how the real world doesn't meet your abstract assumptions.

But then on top of that, we add other dumb things, and those are unforced errors. For most non-physics things, "Are X and Y different?" is just the wrong question. The right question is "Are X and Y different by some amount T that matters substantively to a downstream decision maker?" I think Gelman gets this bit right when he says (roughly) "In social science, the null hypothesis is always false." And this connects back to how this is connected to decisions --- for social science the "decision" is often whether the paper gets published or not, so that messes with the incentives.

TL;DR: For social science questions I'm basically only interested in effects that are big enough to show up obviously in the plots. If you need statistics to tease it out I'm not interested?

Expand full comment
J Lee MD PhD's avatar

I am looking forward to your next post in this series. I spent my academic life in one of the major and prestigious surgery departments in the USA. It was very RARE to encounter anybody who understood the sensible use, and the actual “nuts and bolts”, of hypothesis testing. In particular, I always have found, and still do find, it very difficult to perceive whether the use of hypothesis tests *with purely observational data* is “kosher” or not. There must be multiple concealed assumptions involved? Epidemiology (my “amateur” hobby) is filled with four-zillion uses of these tests featuring their use to detect whether some given Association of two variables was or was not “significant” and then to “calibrate the strength of the association”. What in hell does that all mean ? I think it means that folks doing this must be handling their data AS IF they were fruits of a randomized trial (? !).

Expand full comment
1 more comment...

No posts