Thanks for the article Ben! Served a refreshing reminder of the limitations often unaddressed of significance scores. Maybe not directly related, but do you think there will be a push in the next few years towards causal structural models appearing more widely spread and maybe even becoming the norm in the political / social sciences? I’m not following the field closely but seems causality is going through a revival and maybe its influence will be to the next generation of science fields as significance scores have been for the last century?
Structural models are also very common in social science and have been for some time. And yet they haven't been able to shake loose the grip of the tabular asterisk!
Thats really interesting to me. The abuse of p <0.05 has persisted but it seems structural models over the decades have been constantly been reinventing themselves.
From my perspective, structural equations have come a long way from something like Bentler-Weeks models in the 1980s, and it seems the field acknowledged limitations and came up with techniques to address them each decade since. Even though they both maintain popularity, one seems to constantly be evolving, whereas like you say, we can't seem to shake loose p < 0.05 obsession.
If quant metascientists want to keep on quanting, they could at least acquaint themselves with more sophisticated urn models that have memory. For example, I could have a setup where the urn has balls of two colors and a rule that says "if the ball you draw at time t is red, add 6 red balls; if it is blue, add 3 blue balls and 1 red ball." This introduces the notion of history; if one wants, one can even control the outcomes by having an adaptive rule for changing the proportions of different colors as a function of previous history, etc.
Funny enough that's the kind of urn model we're currently developing. Stochastic model with history. Still wouldn't use it as it's used in these types of papers but it's great for exploration of possibilities (and to contrast with such simplistic versions).
First of all, I love Berna and the critical metascience crowd so very happy to see this post given her work more visibility.
P-value replication etc.. doesn't even figure in my top 20 problems. Yet life scientists pattern matching me to such concerns when rigor means something very different to many of us who engage in quantitative epistemology. (Incidentally, I only recently discovered that Lakatos borrowed the term "adhockery" from Meehl)
I also just saw this AI does p-hacking research paper going around. Didn't realize it was maybe the one that also elicited your post! It is a nice demonstration that scientific/statistical sycophancy are possible. But I regularly elicit sophisticated forms of scientific sycophancy. And again I wonder when are metascientists going to worry about causal validity, lack of identifiability in research designs instead of obsessing about p-hacking again. What about prediction error hacking, external validity hacking, and outcome hacking that goes on without being quantified.
All fair enough, but if you were to do this research, what would be your method? Because you've said before that there's too much quantitative science (too much because it's bad, I'm assuming). So if you had to prove that instead of saying it, what would you do?
Sorry, I wasn't very articulate. I mean this paper you wrote about, which I guess is metascience of political science. I was just wondering how you'd do this research. Or if I put it differently, if you were a peer reviewer for this paper, after writing most of this post in your peer review, you'd give them some suggestions on what they could do to improve, right? What would those be?
Thanks for the article Ben! Served a refreshing reminder of the limitations often unaddressed of significance scores. Maybe not directly related, but do you think there will be a push in the next few years towards causal structural models appearing more widely spread and maybe even becoming the norm in the political / social sciences? I’m not following the field closely but seems causality is going through a revival and maybe its influence will be to the next generation of science fields as significance scores have been for the last century?
Structural models are also very common in social science and have been for some time. And yet they haven't been able to shake loose the grip of the tabular asterisk!
Thats really interesting to me. The abuse of p <0.05 has persisted but it seems structural models over the decades have been constantly been reinventing themselves.
From my perspective, structural equations have come a long way from something like Bentler-Weeks models in the 1980s, and it seems the field acknowledged limitations and came up with techniques to address them each decade since. Even though they both maintain popularity, one seems to constantly be evolving, whereas like you say, we can't seem to shake loose p < 0.05 obsession.
If quant metascientists want to keep on quanting, they could at least acquaint themselves with more sophisticated urn models that have memory. For example, I could have a setup where the urn has balls of two colors and a rule that says "if the ball you draw at time t is red, add 6 red balls; if it is blue, add 3 blue balls and 1 red ball." This introduces the notion of history; if one wants, one can even control the outcomes by having an adaptive rule for changing the proportions of different colors as a function of previous history, etc.
Funny enough that's the kind of urn model we're currently developing. Stochastic model with history. Still wouldn't use it as it's used in these types of papers but it's great for exploration of possibilities (and to contrast with such simplistic versions).
(Also we don't limit ourselves to NHST framework because one does not need to assume science needs to be built on a collection of p-values.)
Agree with all that.
First of all, I love Berna and the critical metascience crowd so very happy to see this post given her work more visibility.
P-value replication etc.. doesn't even figure in my top 20 problems. Yet life scientists pattern matching me to such concerns when rigor means something very different to many of us who engage in quantitative epistemology. (Incidentally, I only recently discovered that Lakatos borrowed the term "adhockery" from Meehl)
I also just saw this AI does p-hacking research paper going around. Didn't realize it was maybe the one that also elicited your post! It is a nice demonstration that scientific/statistical sycophancy are possible. But I regularly elicit sophisticated forms of scientific sycophancy. And again I wonder when are metascientists going to worry about causal validity, lack of identifiability in research designs instead of obsessing about p-hacking again. What about prediction error hacking, external validity hacking, and outcome hacking that goes on without being quantified.
https://x.com/ahall_research/status/2024544040784720365
All fair enough, but if you were to do this research, what would be your method? Because you've said before that there's too much quantitative science (too much because it's bad, I'm assuming). So if you had to prove that instead of saying it, what would you do?
Which research? Political Science? Or Metascience?
Sorry, I wasn't very articulate. I mean this paper you wrote about, which I guess is metascience of political science. I was just wondering how you'd do this research. Or if I put it differently, if you were a peer reviewer for this paper, after writing most of this post in your peer review, you'd give them some suggestions on what they could do to improve, right? What would those be?
''In a week of incredibly annoying and inaccurate AI discourse''. Every week since 2022.