8 Comments
User's avatar
Davis Yoshida's avatar

I don't think contradictory labels is a problem for the interpolation framing. I always viewed it as meaning perfectly fitting the training data (maybe needs a different word). If you have three identical inputs, and one disagrees, just output 2/3. In fact, our models can only interpolate in probability space when labels are stochastic.

Expand full comment
Ben Recht's avatar

the problem is how you define the "interpolated" label is now a modeling decision. there's nothing wrong with that per se, but it's yet another hyperparameter to tune on the test set, right?

Expand full comment
Davis Yoshida's avatar

Oh I was just talking about using it for reasoning. I don't see the connection of the definition to actual practice, are people stopping training after hitting interpolation?

Expand full comment
GB's avatar

If bias/variance are irrelevant concepts, should we also consider the approximation/estimation decomposition as irrelevant, and anything that builds upon it?

Expand full comment
Ben Recht's avatar

In my mind, these are doing the same things. I'm not sure they get us very far. Do you disagree?

Expand full comment
GB's avatar

Don’t think I disagree no. They are different things, but the saving graces of AE are that (1) estimation error is used in uniform convergence arguments, and (2) approximation error is the “ultimate” measure of capacity. But then you argued well against such bounds on another blogpost.

The variance however must be *somewhat* related to algorithmic stability arguments? Not terribly familiar with that literature.

I think it’s interesting the bias/variance decomposition is a decomposition of the *expected* risk, i.e. the mean of the distribution of possible risks. It might be more useful if it was a decomposition of a different distributional statistic, like the mode, or the minimum. Not aware of anything on this.

Expand full comment
Dylan Gorman's avatar

but i remember auditing your undergrad ML class and thinking your explanation of the bias-variance trade-off was beautiful :(

Expand full comment
Ben Recht's avatar

Yikes, Dylan, I don't even remember what I said! Do you recall?

I remember teaching the bias variance trade-off in undergrad ML and thinking to myself, "Wait, this is nonsense. I am lying to the students." That class in particular was the beginning of my journey to abandon ML theory...

Expand full comment