Discussion about this post

User's avatar
Ben Recht's avatar

On Twitter, Lily-Belle Sweet brought up a great question. How do we justify leave-one-out error or the holdout method or cross validation without an appeal to i.i.d.?

Here’s my take on this. When we collect a training data set, it serves as a population itself. If we subsample from this training data, then the i.i.d. assumption holds because we are being intentional. Hence, bootstrap methods are telling us something about internal validity. They are telling us something about how the method performs when the superpopulation is our training set. Then, to generalize beyond this to new data, we just have to convince ourselves that the training set is a representative sample of the data we’ll see.

Thoughts?

Expand full comment
Misha Belkin's avatar

"If our theory gives bad advice for practice, we have to come clean and admit our theory has failed."

I agree with this, of course. It is important to take the next step though and state that we need new theory, consistent with empirical evidence.

Btw, I also find uniform-type bounds quite beautiful. That is a big part of their appeal, I suppose.

Expand full comment
29 more comments...

No posts