Really like this topic because it makes a lot of engineering students uncomfortable.
"How you balance the tradeoff can’t be determined by math." That’s the most interesting part of building models, because it comes down to how they’ll actually be used. Take cancer detection: it’s rare that an AI tool is the sole arbiter of a diagnosis. Clinicians use it to guide follow-up steps.
What they really care about is, "In deployment, when the model says someone has cancer, do they actually have cancer?" That’s the positive predictive value. If PPV is low, they get alert fatigue fast and stop paying attention. And PPV isn’t just about the model, it also depends on prevalence. For rare conditions, even a model with good TPR and FPR can have such a low PPV that nine out of ten alerts are false.
Nostalgia: We (older generation EE folks) have learned this topic from Van Trees' textbook. His wiki page says that "Van Trees was initially on loan to the government by MIT; he then ended up staying for a number of projects." https://en.wikipedia.org/wiki/Harry_L._Van_Trees . Shame on MIT loaning a professor for money, which year is that? I think that explains why he had published rarely after his famous volumes on the topic. I have some friends utilizing his chap. 2 of vol.1 for their entire professional career (Vol. 1: https://www.amazon.com/Detection-Estimation-Modulation-Theory-Part/dp/0471095176/ )
Really like this topic because it makes a lot of engineering students uncomfortable.
"How you balance the tradeoff can’t be determined by math." That’s the most interesting part of building models, because it comes down to how they’ll actually be used. Take cancer detection: it’s rare that an AI tool is the sole arbiter of a diagnosis. Clinicians use it to guide follow-up steps.
What they really care about is, "In deployment, when the model says someone has cancer, do they actually have cancer?" That’s the positive predictive value. If PPV is low, they get alert fatigue fast and stop paying attention. And PPV isn’t just about the model, it also depends on prevalence. For rare conditions, even a model with good TPR and FPR can have such a low PPV that nine out of ten alerts are false.
From this post, it's clear that Neyman-Pearson tests are relevant in machine learning, perhaps as part of your "regulatory" view of the purpose of statistics? This topic comes up in this post and the exchange in the comments. https://errorstatistics.com/2024/10/22/response-to-ben-rechts-post-what-is-statistics-purpose-on-my-neyman-seminar/
Nostalgia: We (older generation EE folks) have learned this topic from Van Trees' textbook. His wiki page says that "Van Trees was initially on loan to the government by MIT; he then ended up staying for a number of projects." https://en.wikipedia.org/wiki/Harry_L._Van_Trees . Shame on MIT loaning a professor for money, which year is that? I think that explains why he had published rarely after his famous volumes on the topic. I have some friends utilizing his chap. 2 of vol.1 for their entire professional career (Vol. 1: https://www.amazon.com/Detection-Estimation-Modulation-Theory-Part/dp/0471095176/ )