Shallow depth

Sep 28, 2023

Conventional wisdom is also wrong about linear models

8 Comments

Sep 29, 2023

I don't think contradictory labels is a problem for the interpolation framing. I always viewed it as meaning perfectly fitting the training data (maybe needs a different word). If you have three identical inputs, and one disagrees, just output 2/3. In fact, our models can only interpolate in probability space when labels are stochastic.

Expand full comment

Reply (1)

Ben Recht

Sep 29, 2023

the problem is how you define the "interpolated" label is now a modeling decision. there's nothing wrong with that per se, but it's yet another hyperparameter to tune on the test set, right?

Expand full comment

Reply (1)

Davis Yoshida

Sep 29, 2023

Oh I was just talking about using it for reasoning. I don't see the connection of the definition to actual practice, are people stopping training after hitting interpolation?

Expand full comment

Sep 28, 2023Edited

If bias/variance are irrelevant concepts, should we also consider the approximation/estimation decomposition as irrelevant, and anything that builds upon it?

Expand full comment

Reply (1)

Ben Recht

Sep 28, 2023

In my mind, these are doing the same things. I'm not sure they get us very far. Do you disagree?

Expand full comment

Reply (1)

Sep 28, 2023Edited

Don’t think I disagree no. They are different things, but the saving graces of AE are that (1) estimation error is used in uniform convergence arguments, and (2) approximation error is the “ultimate” measure of capacity. But then you argued well against such bounds on another blogpost.

The variance however must be *somewhat* related to algorithmic stability arguments? Not terribly familiar with that literature.

I think it’s interesting the bias/variance decomposition is a decomposition of the *expected* risk, i.e. the mean of the distribution of possible risks. It might be more useful if it was a decomposition of a different distributional statistic, like the mode, or the minimum. Not aware of anything on this.

Expand full comment

Dylan Gorman

Sep 28, 2023

but i remember auditing your undergrad ML class and thinking your explanation of the bias-variance trade-off was beautiful :(

Expand full comment

Reply (1)

Ben Recht

Sep 29, 2023

Yikes, Dylan, I don't even remember what I said! Do you recall?

I remember teaching the bias variance trade-off in undergrad ML and thinking to myself, "Wait, this is nonsense. I am lying to the students." That class in particular was the beginning of my journey to abandon ML theory...

Expand full comment

arg min

Shallow depth