Discussion about this post

User's avatar
Cagatay Candan's avatar

The graph looks intuitive to engineers like me since we are used to facing trade-off’s at almost every physical problem. For me, the bias-variance trade off appears as a parameter estimation problem in my mind: Given finite data, it is “more difficult” to reliably estimate many parameters than just a few parameters. Therefore, advice1: we should keep the number of parameters bounded. If we have very few parameters, we can reliably estimate them; but, the model will be crude and not useful with very few parameters. Hence, Advice2: have a lower bound on the number of parameters as well. The literature of AIC (Akaike’s information criterion), BIC on model order selection tries to do this selection systematically for us. These folks received awards for their works and it was our religion in engineering.

I believe our mental shortcoming stems from considering small data regime as the only possible regime of operation. The main issue especially in physical problems has almost always been making the best use of very limited data. The abundance of training data has lead to interpolation type results (with/without noise) and lead to this new school or church!

Expand full comment
Damek Davis's avatar

the easiest way to see the problem with this figure is being forced to teach it. students (in my class at least) would immediately see it was bs and ask the obvious questions about model complexity that you listed here. I was always left feeling deeply unsatisfied with teaching this lecture and the many after where we reasoned about whether the bias or variance would increase / decrease if we adjusted some hyperparameter \lambda.

i wasn't using ESL btw, but ISL, since it was an undergrad course. i inherited the course and taught it for several years. luckily i will not teach that course again :p.

Expand full comment
33 more comments...

No posts