I once searched troves of stats, ML and phil sci literature for a general definition of overfitting. There is none. The best thing that can be said is that it is a relation between a fitting method, a model and data. If someone doesn't like the relation they call it overfit. ML people then decided to confuse me even more and renamed perfect fit to (benign) overfit.
Charles Isbell had a very concrete definition of overfitting when I took ML from him in 2010: "when the training error continues to decrease, but the testing error increases." So interestingly also related to hold out.
He taught using Tom Mitchell's book, but I'm not sure the definition is found there.
I've continued to use this as the definition in intro ML contexts when I'm teaching to avoid the hand waving issue. However I've never loved it and it really doesn't make sense in the more recent context of phenomenon like double descent in neural nets.
I once searched troves of stats, ML and phil sci literature for a general definition of overfitting. There is none. The best thing that can be said is that it is a relation between a fitting method, a model and data. If someone doesn't like the relation they call it overfit. ML people then decided to confuse me even more and renamed perfect fit to (benign) overfit.
Charles Isbell had a very concrete definition of overfitting when I took ML from him in 2010: "when the training error continues to decrease, but the testing error increases." So interestingly also related to hold out.
He taught using Tom Mitchell's book, but I'm not sure the definition is found there.
I've continued to use this as the definition in intro ML contexts when I'm teaching to avoid the hand waving issue. However I've never loved it and it really doesn't make sense in the more recent context of phenomenon like double descent in neural nets.
100% agree with you. I'll write more on this topic coming soon.
Looking forward to reading an installment on why overfitting doesn't exist