12 Comments

Here's a spicy take: Sutton and Barto's book has completely ruined a generation of researchers. Among other things, they barely mention partially observed scenarios. I was shocked to discover that some of my colleagues who work on RL don't see a problem with using uncompressed histories of observations and actions when working with POMDPs, and the mention of belief state elicits blank stares.

Expand full comment
author

I don't consider this spicy at all.

I'm more puzzled by why so many people want to latch their reputation to this notation, terminology, and worldview.

Expand full comment

For me, I plan on looking for more rigor, but two reasons I use Sutton and Barto off the top of my head: I happened to read it first, and I think they pose important and relatively easy unsolved problems. I think it would take me much longer to begin contributing if I started from eg control theory, since it feels so much more developed.

edit: My comment makes me sound more critical of their book than I am. I'm not yet certain I'll prefer the tradeoff of more mathematical rigor. More important, I think Sutton and Barto might more strongly make the same counterargument I somewhat made here already: don't let the perfect be the enemy of the good (or the great).

Expand full comment

Coming late to this party... but this discussion reminds me of something I heard from a professor in Political History a long time ago: "Marx was a philosopher, his followers imparted doctrine".

Expand full comment

I'm not sure I understand. Are you saying I'm taking Sutton and Barto's ideas too far? I roughly only claimed they've posed some important questions. And that, while I think the original post makes many good arguments, I don't feel so strongly about those arguments myself. (Though I'm sure I'll change my tune when someone releases an "Agent Learning" textbook or something...)

Or are you only saying _other_ people have taken Sutton and Barto's ideas too far?

Expand full comment

> Or are you only saying _other_ people have taken Sutton and Barto's ideas too far?

That

Expand full comment
Nov 29, 2023Liked by Ben Recht

You're telling me that if I upload GPT-N to a robot's brain and start running PPO, it won't struggle to its feet moments later? And half an hour later it won't be running at 20 miles per hour?

Expand full comment
author

For which value of N?

Expand full comment

N >> 4

Expand full comment
Nov 29, 2023Liked by Ben Recht

Thanks for the post!

What resource (book, course) would you recommend to unRL one's brain?

Expand full comment
author

Shameless self-promotion, the last couple of chapters here perhaps? https://mlstory.org/

But if you tell me a bit more specifics about what you're looking for, I could send other potential resources.

Expand full comment

A good way to understand **cooking** is to consider some of the examples and possible applications that have guided its development.

- A red pill that, if taken, reveals unpleasant truths for you.

- A druid recipe from ancient Gaulle that lets you prepare a drink so powerful, you will have the muscles of ten for the rest of the day!

- A magic potion that will turn the user into an invincible bear, immune to the arrows of all hunters of the realm combined.

- A medicine so strong, it cures cancer.

Expand full comment