1 Comment

Enjoying this series. Not sure I want to step into the fray on generalization because I don't think I know enough. Hasn't there been some theory that deals with the question of "match your beliefs about what a good prediction function should look like"? There's some work that's shown how structure of networks (like convolutions) is good for image-like structure. That seems like a good direction if you can make some assumptions about the right prediction function... although that question seems fairly un-answerable for lots of real-world data.

What I'm also uncertain of is whether current theory tells us enough about how the structure of the input data affect the problem. To take your example of images, pictures of things in the real world have a lot of nonlinear structure that makes them inherently lower dimensional than a vector of dimension # pixels. Can current theory explain success of ML in terms of this? I am thinking of the "adaptivity to reduced support" for linear subspace dependence from https://francisbach.com/quest-for-adaptivity/ which says even kernel methods should be okay. Does this help "rescue" generalization theory a little bit?

Expand full comment