I would be interested to see an experiment of the following form:
Pick a bunch of things from the pot of crud correlations.
Give them to theoreticians. Tell the theoreticians whatever things they got are strongly correlates (even if they are actually inversely correlated).
Have the theoreticians come up with an explanation for whatever things they were given.
Have other people rate the explanations the theoreticians come up with.
Are the explanations for things with actually positive correlations rated as being better than the explanations for things with secretly negative correlations?
All good points: This situation calls for a more global analysis that looks simultaneously at all of these correlations to find the causal factors. Has anyone attempted that?
Regarding the truth or falsity of H0: Perhaps it would be better if H0 were called a "reference hypothesis"? Statistics is fundamentally about whether data has the power to resolve differences among multiple reference hypotheses. It seems to me that asking for any hypothesis to be "true" is fundamental misguided, and this is one of the ontological flaws of the Bayesian approach.
Meehl's issue (and mine) is that the reference hypothesis should be that the correlation between the treatment and outcome follows a crud distribution. I'm going to develop this more this week, so stay tuned.
One pedantic note on the RCT example in the footnote: I suspect the null hypothesis is false even in this example. Those bottles are probably snowflakes if you look closely enough. First, there could be systematic biases, e.g., they're from different lots produced under slightly different conditions, they were transported and stored at different temperatures, etc. But also, pill-to-pill variation means that no matter what the two bottles are slightly different populations. If the null hypothesis is "the expected treatment effect of a pill drawn from bottle 1 is identical to the expected treatment effect of a pill drawn from bottle 2", where expectations are w.r.t. the (hypothetically infinite) population of patients and the (finite) population of pills in each bottle, then I think it's almost certainly false. (You can't prove it, though, since you can't test the same pill twice.)
Unrelated: Did Meehl really not think to use the average *magnitude* of the Pearson r? Who cares about signs? "Adolescent delinquency negatively correlated with sunscreen use" is a perfectly good headline! (I guess preregistering the expected sign of the effect would at least cut the chance of a significant result in half.)
Point taken! You're probably right, and it's probably way worse with regards to actual treatments. Though I imagine the Pearson-r is still much lower that 0.1. Someone should study that! I have other gripes with RCTs that I'll get to, eventually.
Second, Meehl is never 100% precise about what he means with regards to crud. I can't find him writing out a mathematical definition anywhere. But you are probably right that he means average absolute value! I can use that definition if you want.
This week, I was just going to assume the corelation coefficients were normally distributed with some mean and variance. If we assume they have mean zero and average absolute value c, then their standard deviation would be sqrt(pi/2)*c. I don't think it is going to matter much either way.
I propose the following: crud factor = mean(|r|) and crud distribution is whatever the ambient distribution of r is. What do you think?
Sounds good to me! I'd also be perfectly happy with defining the crud factor to be sqrt(mean(r^2)), or even mean(r^2), whatever's convenient that's invariant to sign flips.
I would be interested to see an experiment of the following form:
Pick a bunch of things from the pot of crud correlations.
Give them to theoreticians. Tell the theoreticians whatever things they got are strongly correlates (even if they are actually inversely correlated).
Have the theoreticians come up with an explanation for whatever things they were given.
Have other people rate the explanations the theoreticians come up with.
Are the explanations for things with actually positive correlations rated as being better than the explanations for things with secretly negative correlations?
All good points: This situation calls for a more global analysis that looks simultaneously at all of these correlations to find the causal factors. Has anyone attempted that?
Regarding the truth or falsity of H0: Perhaps it would be better if H0 were called a "reference hypothesis"? Statistics is fundamentally about whether data has the power to resolve differences among multiple reference hypotheses. It seems to me that asking for any hypothesis to be "true" is fundamental misguided, and this is one of the ontological flaws of the Bayesian approach.
Meehl's issue (and mine) is that the reference hypothesis should be that the correlation between the treatment and outcome follows a crud distribution. I'm going to develop this more this week, so stay tuned.
I look forward to it!
One pedantic note on the RCT example in the footnote: I suspect the null hypothesis is false even in this example. Those bottles are probably snowflakes if you look closely enough. First, there could be systematic biases, e.g., they're from different lots produced under slightly different conditions, they were transported and stored at different temperatures, etc. But also, pill-to-pill variation means that no matter what the two bottles are slightly different populations. If the null hypothesis is "the expected treatment effect of a pill drawn from bottle 1 is identical to the expected treatment effect of a pill drawn from bottle 2", where expectations are w.r.t. the (hypothetically infinite) population of patients and the (finite) population of pills in each bottle, then I think it's almost certainly false. (You can't prove it, though, since you can't test the same pill twice.)
Unrelated: Did Meehl really not think to use the average *magnitude* of the Pearson r? Who cares about signs? "Adolescent delinquency negatively correlated with sunscreen use" is a perfectly good headline! (I guess preregistering the expected sign of the effect would at least cut the chance of a significant result in half.)
Point taken! You're probably right, and it's probably way worse with regards to actual treatments. Though I imagine the Pearson-r is still much lower that 0.1. Someone should study that! I have other gripes with RCTs that I'll get to, eventually.
Second, Meehl is never 100% precise about what he means with regards to crud. I can't find him writing out a mathematical definition anywhere. But you are probably right that he means average absolute value! I can use that definition if you want.
This week, I was just going to assume the corelation coefficients were normally distributed with some mean and variance. If we assume they have mean zero and average absolute value c, then their standard deviation would be sqrt(pi/2)*c. I don't think it is going to matter much either way.
I propose the following: crud factor = mean(|r|) and crud distribution is whatever the ambient distribution of r is. What do you think?
Sounds good to me! I'd also be perfectly happy with defining the crud factor to be sqrt(mean(r^2)), or even mean(r^2), whatever's convenient that's invariant to sign flips.