5 Comments
User's avatar
Dean Eckles's avatar

On the point of how large of a correlation you get get when prevalence is low, you might be interested in concepts like "switch relative risk" https://arxiv.org/abs/2106.06316v1

Ben Recht's avatar

Will take a look. Thank you!

Maxim Raginsky's avatar

If you allow arbitrary nonlinear transformations of the covariates, then it seems like the crud factor is nicely captured by the maximal correlation, see e.g. https://www.jstor.org/stable/2242042. This, of course, does not resolve any of the epistemological or methodological issues.

Ben Recht's avatar

I don't think this is the crud factor because it's not random. But it's definitely related and interesting. I wonder if there's an analog that somehow lets you compute average pairwise maximal correlation between pairs of variables. I will read the original Friedman and Breiman paper and think about it.

Maxim Raginsky's avatar

You could randomize over the choice of transformations, which could include random picks from the buckets of theories and covariates.