Feb 5Liked by Ben Recht

One idea for “what else can we do?” is here: https://doi.org/10.1016/j.tics.2022.05.008 In cognitive science we can treat each participant as a replication of an N=1 study, then formally quantify how this generalises to the population by estimating the population-level experimental replication probability (the prevalence): https://elifesciences.org/articles/62461 I think this approach could also be useful in other areas.

Expand full comment
Jan 12Liked by Ben Recht

I don't have a very sophisticated understanding of statistics, so please correct me if I'm wrong, but I think your core point is that large data might be useful for finding small effect sizes, but we've unnecessarily lost focus on small studies with large effect sizes. Is that right? But don't small studies often show small effect sizes? Isn't the deeper issue that we've picked all the low hanging fruit in science?

Expand full comment

You'll enjoy this pithy little chestnut from Charles Geyer:

"The story about n going to infinity is even less plausible in spatial statistics and statistical genetics where every component of the data may be correlated with every other component. Suppose we have data on school districts of Minnesota. How does Minnesota go to infinity? By invasion of surrounding states and provinces of Canada, not to mention Lake Superior, and eventually by rocket ships to outer space? How silly does the n goes to infinity story have to be before it provokes laughter instead of reverence?"

Read the whole thing, it's worth it: https://www.stat.umn.edu/geyer/lecam/simple.pdf

Expand full comment

recent subscriber but i like the kick you’re on here. The concepts in this post remind me of the database torture / observational study mayhem during the covid-publishing era.

I’m not sure a return to small data is possible or necessary, but if it is, how do you imagine it happening? It’s much easier to curate large observational datasets than ones where you must design a study and execute it. what incentives could exist which would drive the latter even though it is much more difficult?

Expand full comment