6 Comments
User's avatar
rif a saurous's avatar

I think your err_ext decomposition has some sign errors?

Expand full comment
Ben Recht's avatar

LOL yes. I fixed it.

Expand full comment
Joe Jordan's avatar

"It works in practice but not in theory."

But seriously though, I think the intuition from coin flips is misleading because 1d space still has a reasonable Euclidean metric but in the real world use cases (or anything above 9d) you don't want Euclidean but Manhattan metric. It gets even harder (for me at least) to have intuition about how nonlinearities impact a space and the metrics defining how far apart thing are. Basically I think it works because with enough parameters you can make points sufficiently, though probably not arbitrarily, far apart.

Expand full comment
Shayan Kiyani's avatar

Interesting! However, i find that boosting attack and the example of linear combination of n functions a little bit misleading. What actually happens in practice is people dont get n random functions (with very bad training performance), but they get n functions that (or almost) interpolating the training data! Now if you mix and match those n functions, they never run into that disaster you said. Worst thing that can happen is that they also interpolate the test set too (along with the training set). And that might not be a very bad thing after all, given the emerging knowledge about deep learning.

Expand full comment
Philipp's avatar

I think the main part of why this isn't a problem is that even if we know all the test set labels, we are usually not using them directly. There's a prescription that we may only use models that can be fitted relying on the train set only given some hyperparams, and not adapt them by special test set sample specific rules or training on them directly.

RL methods have a pretty hard job overfitting to test set rewards. It would be much easier if they could change the model outputs directly, but usually they are only allowed to inform hyperparameters. Selecting hyperparameters in such a way to make individual samples go positive or negative is pretty hard.

So there are some constraints in the way set up our learning and evaluation methods, that prevent this from being too big of a problem.

Expand full comment
User's avatar
Comment deleted
Feb 18Edited
Comment deleted
Expand full comment
Ben Recht's avatar

The frontier math thing is a different issue: if the test examples are in the training set, that's a whole other mess. In the setup of this post, I don't even have to look at the training data at all.

Do you have other examples of test set leakage from RL attacks?

Expand full comment