I don't think contradictory labels is a problem for the interpolation framing. I always viewed it as meaning perfectly fitting the training data (maybe needs a different word). If you have three identical inputs, and one disagrees, just output 2/3. In fact, our models can only interpolate in probability space when labels are stochastic.
the problem is how you define the "interpolated" label is now a modeling decision. there's nothing wrong with that per se, but it's yet another hyperparameter to tune on the test set, right?
Oh I was just talking about using it for reasoning. I don't see the connection of the definition to actual practice, are people stopping training after hitting interpolation?
If bias/variance are irrelevant concepts, should we also consider the approximation/estimation decomposition as irrelevant, and anything that builds upon it?
Don’t think I disagree no. They are different things, but the saving graces of AE are that (1) estimation error is used in uniform convergence arguments, and (2) approximation error is the “ultimate” measure of capacity. But then you argued well against such bounds on another blogpost.
The variance however must be *somewhat* related to algorithmic stability arguments? Not terribly familiar with that literature.
I think it’s interesting the bias/variance decomposition is a decomposition of the *expected* risk, i.e. the mean of the distribution of possible risks. It might be more useful if it was a decomposition of a different distributional statistic, like the mode, or the minimum. Not aware of anything on this.
Yikes, Dylan, I don't even remember what I said! Do you recall?
I remember teaching the bias variance trade-off in undergrad ML and thinking to myself, "Wait, this is nonsense. I am lying to the students." That class in particular was the beginning of my journey to abandon ML theory...
I don't think contradictory labels is a problem for the interpolation framing. I always viewed it as meaning perfectly fitting the training data (maybe needs a different word). If you have three identical inputs, and one disagrees, just output 2/3. In fact, our models can only interpolate in probability space when labels are stochastic.
the problem is how you define the "interpolated" label is now a modeling decision. there's nothing wrong with that per se, but it's yet another hyperparameter to tune on the test set, right?
Oh I was just talking about using it for reasoning. I don't see the connection of the definition to actual practice, are people stopping training after hitting interpolation?
If bias/variance are irrelevant concepts, should we also consider the approximation/estimation decomposition as irrelevant, and anything that builds upon it?
In my mind, these are doing the same things. I'm not sure they get us very far. Do you disagree?
Don’t think I disagree no. They are different things, but the saving graces of AE are that (1) estimation error is used in uniform convergence arguments, and (2) approximation error is the “ultimate” measure of capacity. But then you argued well against such bounds on another blogpost.
The variance however must be *somewhat* related to algorithmic stability arguments? Not terribly familiar with that literature.
I think it’s interesting the bias/variance decomposition is a decomposition of the *expected* risk, i.e. the mean of the distribution of possible risks. It might be more useful if it was a decomposition of a different distributional statistic, like the mode, or the minimum. Not aware of anything on this.
but i remember auditing your undergrad ML class and thinking your explanation of the bias-variance trade-off was beautiful :(
Yikes, Dylan, I don't even remember what I said! Do you recall?
I remember teaching the bias variance trade-off in undergrad ML and thinking to myself, "Wait, this is nonsense. I am lying to the students." That class in particular was the beginning of my journey to abandon ML theory...