Sounds like it will be an interesting semester! On the action side, some of the things you mention apply in totally deterministic environments (e.g. LQR looks the same even in the absence of process noise). And it's often possible to replace "assume Guassian noise" with "minimize an appropriate least squares objective" (e.g., for state estimation: https://slides.com/sarahdean-2/08-state-estimation-ml-in-feedback-sys?token=565bwizg#/12/0/3) -- of course, without a Gaussian model, there's no "deeper" motivation for why least squares is the "correct" objective to have.
I'm hoping by mid-October I'll be able to generalize the "LQR is H2 minimization" story. Because I agree that most things linear-quadratic-gaussian (like least-squares) aren't fundamentally about Gaussians.
That Dawid article has been on my to read list. I'll check out your slides before reading it. Thanks for sending those!
I'm super excited about this course, Ben. Will you write and share the notes here?
Is there ever a time when natural randomness makes sense as an assumption in ML? E.g., in supervised learning, I think we invoke it mainly in place of assuming we have access to "a sufficiently dense cover" of the (metaphysical) space of samples. If you have a dense cover, you can prove generalization bounds based solely on continuity and remove randomness entirely. Is this the perspective you're taking?
My plan is to blog lecture summaries here and point to the relevant sections of Patterns, Predictions, and Actions for the details. And in the case that those details don't exist, I'll do my best to supplement with lecture notes here. But I'm going to try to keep the substack equation-free.
I don't think there are any times when "natural randomness" makes sense in ML. I also think the idea of "super populations" more generally is almost always suspect.
But what you describe is one way to motivate supervised learning: sufficiently dense + diverse sampling suffices to interpolate. One model that generates sufficient density is i.i.d. sampling. If you *can* i.i.d. sample then it's a good idea. But that would be intentional probability. Does that make sense?
> I also think the idea of "super populations" more generally is almost always suspect.
I agree with this and it's why I've always been suspicious of the "nature gives you a random set of samples from the population distribution" description of learning.
Yes. I'd be fine with super-populations if they let us predict something useful about practice. But they don't! If anything, super-populations predict the opposite of empirical observation. This is why I'm so hung up on them.
Sounds like it will be an interesting semester! On the action side, some of the things you mention apply in totally deterministic environments (e.g. LQR looks the same even in the absence of process noise). And it's often possible to replace "assume Guassian noise" with "minimize an appropriate least squares objective" (e.g., for state estimation: https://slides.com/sarahdean-2/08-state-estimation-ml-in-feedback-sys?token=565bwizg#/12/0/3) -- of course, without a Gaussian model, there's no "deeper" motivation for why least squares is the "correct" objective to have.
On the static prediction side, I find the framing of Michael Kim's "Outcome Indistinguishability" helpful for thinking about where uncertainty comes from. I also like the philosophy paper that inspired the work: https://link.springer.com/article/10.1007/s11229-015-0953-4. It provides a nice taxonomy of interpretations of probability. (I made some summary slides of it here: https://slides.com/sarahdean-2/aipp-probability-time-individual-risk?token=vx-PDQk9)
I'm hoping by mid-October I'll be able to generalize the "LQR is H2 minimization" story. Because I agree that most things linear-quadratic-gaussian (like least-squares) aren't fundamentally about Gaussians.
That Dawid article has been on my to read list. I'll check out your slides before reading it. Thanks for sending those!
This is the (Willems) way: https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2004.1.pdf
Yeah, and I think the H2 synthesis story from the 90s is basically the same.
Are you planning to teach anything about LLMs, Ben?
Nothing that's not in Shannon's paper.
:)
So adorable.
I'm super excited about this course, Ben. Will you write and share the notes here?
Is there ever a time when natural randomness makes sense as an assumption in ML? E.g., in supervised learning, I think we invoke it mainly in place of assuming we have access to "a sufficiently dense cover" of the (metaphysical) space of samples. If you have a dense cover, you can prove generalization bounds based solely on continuity and remove randomness entirely. Is this the perspective you're taking?
My plan is to blog lecture summaries here and point to the relevant sections of Patterns, Predictions, and Actions for the details. And in the case that those details don't exist, I'll do my best to supplement with lecture notes here. But I'm going to try to keep the substack equation-free.
I don't think there are any times when "natural randomness" makes sense in ML. I also think the idea of "super populations" more generally is almost always suspect.
But what you describe is one way to motivate supervised learning: sufficiently dense + diverse sampling suffices to interpolate. One model that generates sufficient density is i.i.d. sampling. If you *can* i.i.d. sample then it's a good idea. But that would be intentional probability. Does that make sense?
Yep, this makes perfect sense.
> I also think the idea of "super populations" more generally is almost always suspect.
I agree with this and it's why I've always been suspicious of the "nature gives you a random set of samples from the population distribution" description of learning.
Yes. I'd be fine with super-populations if they let us predict something useful about practice. But they don't! If anything, super-populations predict the opposite of empirical observation. This is why I'm so hung up on them.
Today's reflections really resonated with me. Also a fan of your book!
Many thanks!
> But if our axioms are all unverifiable, and if the guidance of the theory doesn’t reflect practice, then what is this theory for?
A cynic might wonder whether ML theory primarily serves the purpose of professionalizing the occupation of "machine learning researcher": https://en.wikipedia.org/wiki/Professionalization.