You got a 9 to 5, so I'll take the night…

Oct 10, 2023

The robustness of the holdout method can't save us from populations changing.

7 Comments

Oct 16, 2023

On the monotonicity. Could it be explained by importance sampling? Test error is an aggregate metric and may be we are looking at importance sample weighted average to account for the distribution difference..

Expand full comment

Reply (1)

Ben Recht

Oct 16, 2023

I don't quite see how to push that argument through. Could you say more?

Expand full comment

Reply (1)

Badri

Oct 16, 2023

You are right - I typed without much thought and so couldn't formalize! The importance sampling weighted average can be non-linear in the original test score. Perhaps the linear fit hides a trend you can see in the graph where it seems like that the more accurate models improve test set peformance beyond what the linear extrapolation would indicate. https://xkcd.com/2048/

Expand full comment

Chris

Oct 11, 2023

I'm a little surprised, but not overly, at the monotonicity. It makes sense that the only way to do well on every test set is to learn the precise concept. And by that, I mean not only the task, but the selection criteria and so on. What's harder for me to fathom is how narrowly *linear* that is. That means that for every X errors on the real test set there will be mX errors on some new set. Why would it be so precise?

Gael Varoquaux explained it to me once that, suppose your dataset is undersampled, and say it only spans some affine subspace, or some sub-manifold. Do you conclude that your model only describes that affine space or that manifold? No. Why? Because if there is a meaningful concept that only applies to that subspace, and that's different from the ambient concept, then in order to discover it you would have had to sample from a zero-measure subset. That doesn't happen. The only thing that has any chance of happening is missing small modes.

Expand full comment

Reply (1)

Ben Recht

Oct 12, 2023

I have zero idea for why the correlation is so strong. In some sense, it should be strong because the tasks are so strongly correlated. But I have no good explanation for that scatter plot pattern. We've seen this trend in hundreds of data sets now. There's something there...

Expand full comment

galen

Oct 10, 2023

"And I'll never see you again if I can help it" -the new population to the old population

> These sorts of shifts of contexts and populations are the major challenge for predictive engineering. I’m not sure what anyone can hope to do except constantly update the models so they are as current as possible and hope for the best.

Is there any hope for theory in predicting population changes and uncertainty in real systems, or do you see this fundamentally as an engineering problem?

Expand full comment

Reply (1)

Ben Recht

Oct 10, 2023

1) What's wrong with engineering? I like engineering!

2) In order to predict that the future will be unlike the past requires some sort of information not present in "data." I have nothing against modeling and forecasting, but I don't see how you can proceed without them. And without knowing that these problems are humbling and hard.

Expand full comment

arg min

You got a 9 to 5, so I'll take the night…