Discussion about this post

User's avatar
Davis Yoshida's avatar

> The training set gives us zero insights about the part of w outside of the span of the data.

I'm confused about why this specifically is the case. Without further assumptions, the training set doesn't give us information about inputs which aren't included in the training set. Obviously we need to make some inductive assumptions. Which assumption leads to singling out linear combinations of the training data as special?

Expand full comment
Lalitha Sankar's avatar

Super enjoying these writings even as I teach these concepts in class (and don't have much time to respond; I'm still hoping to respond someday to your first post on Shannon, LLMs, and the lack of need for probability in ML (I disagree but that's for another day).

"You are nothing beyond your data" -- so, yeah, kernels clarify that any prediction is a linear combination of the high-dimensional version (kernelized) of the training data. That said, there is a phase transition computationally with these methods. In my lab, some of my students focused on data science for power systems use kernels methods (some modicum of "explainability") but are kernels even a thing these days given your previous blog on all things NNs? And where does NTK fit in all this? :)

Expand full comment
12 more comments...

No posts