> “A ‘valid’ preregistration plan necessitates knowing the outcome of all aspects of an experiment before conducting it. Preregistration makes it impossible to adapt to the actuality of experimental conditions.”
This is definitely a large drawback of preregistration, but isn’t it just a constraint introduced by frequentist assumptions (of which I recognize you’re generally skeptical)? It would be nice to learn-as-you-go from data, and adapt your methods appropriately, but this does render hypothesis tests useless/biased. Given that this seems to be what researchers are currently doing, a mechanism that weakly-enforces concordance with a priori design/hypothesizing seems reasonable.
Sure, I have written about hypothesis tests before, and think they are useless and biased for inference, but a powerful tool for rulemaking and regulation:
With regards to papers, the NHST are just there as part of the rule book to get published. I think preregistration is a poor rule: it enforces excessive rigidity, requires a sort of mystical foresight about how experiments will play out, and since no one wants to run an experiment that doesn't pass the test, it creates perverse incentives in design.
I personally think that the other open science innovations in terms of software and code sharing are more valuable, as communities can then create a deeper inferential digestion of their work.
Thanks for sharing this—insightful read! I would be curious to hear your positions on the recent influx of AI scientist systems like Sakana, Intology, and AutoScience working on near end-to-end automation of scientific discovery and paper writing for AI venues. What validity concerns do you have about these systems? Are there parts of the scientific research, peer review, and dissemination process where these systems might actually enhance internal, external, or construct validity evaluations?
I mostly ignore that body of work because every time I look, it ends up being snake oil.
But maybe you have a thought about how these systems could be used productively? I'm just saying that in all of the examples I've seen (on Twitter), it's been a lot of easily debunkable hot air.
I am surprised that incremental hill-climbing experimental methodology is as successful and productive as it is in NLP and ML. But even though Sutton's Bitter Lesson seems a fairly accurate description of the field, it doesn't explain why Deep Learning succeeded where earlier approaches failed.
As fare as specifying a recipe for good science, I think the recent XKCD is pretty good https://m.xkcd.com/3101/ I like the mouse-over text: "If you think curiosity without rigor is bad, you should see rigor without curiosity."
> “A ‘valid’ preregistration plan necessitates knowing the outcome of all aspects of an experiment before conducting it. Preregistration makes it impossible to adapt to the actuality of experimental conditions.”
This is definitely a large drawback of preregistration, but isn’t it just a constraint introduced by frequentist assumptions (of which I recognize you’re generally skeptical)? It would be nice to learn-as-you-go from data, and adapt your methods appropriately, but this does render hypothesis tests useless/biased. Given that this seems to be what researchers are currently doing, a mechanism that weakly-enforces concordance with a priori design/hypothesizing seems reasonable.
How else should we get around this?
Sure, I have written about hypothesis tests before, and think they are useless and biased for inference, but a powerful tool for rulemaking and regulation:
https://arxiv.org/abs/2501.03457
With regards to papers, the NHST are just there as part of the rule book to get published. I think preregistration is a poor rule: it enforces excessive rigidity, requires a sort of mystical foresight about how experiments will play out, and since no one wants to run an experiment that doesn't pass the test, it creates perverse incentives in design.
I personally think that the other open science innovations in terms of software and code sharing are more valuable, as communities can then create a deeper inferential digestion of their work.
Thanks for sharing this—insightful read! I would be curious to hear your positions on the recent influx of AI scientist systems like Sakana, Intology, and AutoScience working on near end-to-end automation of scientific discovery and paper writing for AI venues. What validity concerns do you have about these systems? Are there parts of the scientific research, peer review, and dissemination process where these systems might actually enhance internal, external, or construct validity evaluations?
I mostly ignore that body of work because every time I look, it ends up being snake oil.
But maybe you have a thought about how these systems could be used productively? I'm just saying that in all of the examples I've seen (on Twitter), it's been a lot of easily debunkable hot air.
I am surprised that incremental hill-climbing experimental methodology is as successful and productive as it is in NLP and ML. But even though Sutton's Bitter Lesson seems a fairly accurate description of the field, it doesn't explain why Deep Learning succeeded where earlier approaches failed.
As fare as specifying a recipe for good science, I think the recent XKCD is pretty good https://m.xkcd.com/3101/ I like the mouse-over text: "If you think curiosity without rigor is bad, you should see rigor without curiosity."