5 Comments
User's avatar
Dean Eckles's avatar

On the point of how large of a correlation you get get when prevalence is low, you might be interested in concepts like "switch relative risk" https://arxiv.org/abs/2106.06316v1

Expand full comment
Ben Recht's avatar

Will take a look. Thank you!

Expand full comment
Maxim Raginsky's avatar

If you allow arbitrary nonlinear transformations of the covariates, then it seems like the crud factor is nicely captured by the maximal correlation, see e.g. https://www.jstor.org/stable/2242042. This, of course, does not resolve any of the epistemological or methodological issues.

Expand full comment
Ben Recht's avatar

I don't think this is the crud factor because it's not random. But it's definitely related and interesting. I wonder if there's an analog that somehow lets you compute average pairwise maximal correlation between pairs of variables. I will read the original Friedman and Breiman paper and think about it.

Expand full comment
Maxim Raginsky's avatar

You could randomize over the choice of transformations, which could include random picks from the buckets of theories and covariates.

Expand full comment