On the point of how large of a correlation you get get when prevalence is low, you might be interested in concepts like "switch relative risk" https://arxiv.org/abs/2106.06316v1
If you allow arbitrary nonlinear transformations of the covariates, then it seems like the crud factor is nicely captured by the maximal correlation, see e.g. https://www.jstor.org/stable/2242042. This, of course, does not resolve any of the epistemological or methodological issues.
I don't think this is the crud factor because it's not random. But it's definitely related and interesting. I wonder if there's an analog that somehow lets you compute average pairwise maximal correlation between pairs of variables. I will read the original Friedman and Breiman paper and think about it.
On the point of how large of a correlation you get get when prevalence is low, you might be interested in concepts like "switch relative risk" https://arxiv.org/abs/2106.06316v1
Will take a look. Thank you!
If you allow arbitrary nonlinear transformations of the covariates, then it seems like the crud factor is nicely captured by the maximal correlation, see e.g. https://www.jstor.org/stable/2242042. This, of course, does not resolve any of the epistemological or methodological issues.
I don't think this is the crud factor because it's not random. But it's definitely related and interesting. I wonder if there's an analog that somehow lets you compute average pairwise maximal correlation between pairs of variables. I will read the original Friedman and Breiman paper and think about it.
You could randomize over the choice of transformations, which could include random picks from the buckets of theories and covariates.