Learning with intentional randomness

Aug 21, 2023

A quick preview of my semester pedagogical challenge.

16 Comments

Aug 21, 2023

Sounds like it will be an interesting semester! On the action side, some of the things you mention apply in totally deterministic environments (e.g. LQR looks the same even in the absence of process noise). And it's often possible to replace "assume Guassian noise" with "minimize an appropriate least squares objective" (e.g., for state estimation: https://slides.com/sarahdean-2/08-state-estimation-ml-in-feedback-sys?token=565bwizg#/12/0/3) -- of course, without a Gaussian model, there's no "deeper" motivation for why least squares is the "correct" objective to have.

On the static prediction side, I find the framing of Michael Kim's "Outcome Indistinguishability" helpful for thinking about where uncertainty comes from. I also like the philosophy paper that inspired the work: https://link.springer.com/article/10.1007/s11229-015-0953-4. It provides a nice taxonomy of interpretations of probability. (I made some summary slides of it here: https://slides.com/sarahdean-2/aipp-probability-time-individual-risk?token=vx-PDQk9)

Expand full comment

Reply (1)

Ben Recht

Aug 21, 2023

I'm hoping by mid-October I'll be able to generalize the "LQR is H2 minimization" story. Because I agree that most things linear-quadratic-gaussian (like least-squares) aren't fundamentally about Gaussians.

That Dawid article has been on my to read list. I'll check out your slides before reading it. Thanks for sending those!

Expand full comment

Reply (1)

Maxim Raginsky

Aug 21, 2023

This is the (Willems) way: https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2004.1.pdf

Expand full comment

Reply (1)

Ben Recht

Aug 21, 2023

Yeah, and I think the H2 synthesis story from the 90s is basically the same.

Expand full comment

Misha Belkin

Aug 21, 2023

Are you planning to teach anything about LLMs, Ben?

Expand full comment

Reply (1)

Ben Recht

Aug 21, 2023

Nothing that's not in Shannon's paper.

Expand full comment

Expand full comment

So adorable.

Expand full comment

Damek Davis

Aug 21, 2023

I'm super excited about this course, Ben. Will you write and share the notes here?

Is there ever a time when natural randomness makes sense as an assumption in ML? E.g., in supervised learning, I think we invoke it mainly in place of assuming we have access to "a sufficiently dense cover" of the (metaphysical) space of samples. If you have a dense cover, you can prove generalization bounds based solely on continuity and remove randomness entirely. Is this the perspective you're taking?

Expand full comment

Reply (2)

Ben Recht

Aug 21, 2023

My plan is to blog lecture summaries here and point to the relevant sections of Patterns, Predictions, and Actions for the details. And in the case that those details don't exist, I'll do my best to supplement with lecture notes here. But I'm going to try to keep the substack equation-free.

Expand full comment

Ben Recht

Aug 21, 2023

I don't think there are any times when "natural randomness" makes sense in ML. I also think the idea of "super populations" more generally is almost always suspect.

But what you describe is one way to motivate supervised learning: sufficiently dense + diverse sampling suffices to interpolate. One model that generates sufficient density is i.i.d. sampling. If you *can* i.i.d. sample then it's a good idea. But that would be intentional probability. Does that make sense?

Expand full comment

Reply (1)

Damek Davis

Aug 21, 2023

Yep, this makes perfect sense.

> I also think the idea of "super populations" more generally is almost always suspect.

I agree with this and it's why I've always been suspicious of the "nature gives you a random set of samples from the population distribution" description of learning.

Expand full comment

Reply (1)

Ben Recht

Aug 21, 2023

Yes. I'd be fine with super-populations if they let us predict something useful about practice. But they don't! If anything, super-populations predict the opposite of empirical observation. This is why I'm so hung up on them.

Expand full comment

Thomas Berrueta

Aug 21, 2023

Today's reflections really resonated with me. Also a fan of your book!

Expand full comment

Reply (1)

Ben Recht

Aug 21, 2023

Many thanks!

Expand full comment

Jeremy Cohen

Aug 22, 2023Edited

> But if our axioms are all unverifiable, and if the guidance of the theory doesn’t reflect practice, then what is this theory for?

A cynic might wonder whether ML theory primarily serves the purpose of professionalizing the occupation of "machine learning researcher": https://en.wikipedia.org/wiki/Professionalization.

Expand full comment

arg min

Learning with intentional randomness