13 Comments
User's avatar
Maxim Raginsky's avatar

Here's a spicy take: Sutton and Barto's book has completely ruined a generation of researchers. Among other things, they barely mention partially observed scenarios. I was shocked to discover that some of my colleagues who work on RL don't see a problem with using uncompressed histories of observations and actions when working with POMDPs, and the mention of belief state elicits blank stares.

Ben Recht's avatar

I don't consider this spicy at all.

I'm more puzzled by why so many people want to latch their reputation to this notation, terminology, and worldview.

Braham Snyder's avatar

For me, I plan on looking for more rigor, but two reasons I use Sutton and Barto off the top of my head: I happened to read it first, and I think they pose important and relatively easy unsolved problems. I think it would take me much longer to begin contributing if I started from eg control theory, since it feels so much more developed.

edit: My comment makes me sound more critical of their book than I am. I'm not yet certain I'll prefer the tradeoff of more mathematical rigor. More important, I think Sutton and Barto might more strongly make the same counterargument I somewhat made here already: don't let the perfect be the enemy of the good (or the great).

Miguel's avatar

Coming late to this party... but this discussion reminds me of something I heard from a professor in Political History a long time ago: "Marx was a philosopher, his followers imparted doctrine".

Braham Snyder's avatar

I'm not sure I understand. Are you saying I'm taking Sutton and Barto's ideas too far? I roughly only claimed they've posed some important questions. And that, while I think the original post makes many good arguments, I don't feel so strongly about those arguments myself. (Though I'm sure I'll change my tune when someone releases an "Agent Learning" textbook or something...)

Or are you only saying _other_ people have taken Sutton and Barto's ideas too far?

Miguel's avatar

> Or are you only saying _other_ people have taken Sutton and Barto's ideas too far?

That

Sarah Dean's avatar

You're telling me that if I upload GPT-N to a robot's brain and start running PPO, it won't struggle to its feet moments later? And half an hour later it won't be running at 20 miles per hour?

∂jalel's avatar

Thanks for the post!

What resource (book, course) would you recommend to unRL one's brain?

Ben Recht's avatar

Shameless self-promotion, the last couple of chapters here perhaps? https://mlstory.org/

But if you tell me a bit more specifics about what you're looking for, I could send other potential resources.

Justin Bayer's avatar

A good way to understand **cooking** is to consider some of the examples and possible applications that have guided its development.

- A red pill that, if taken, reveals unpleasant truths for you.

- A druid recipe from ancient Gaulle that lets you prepare a drink so powerful, you will have the muscles of ten for the rest of the day!

- A magic potion that will turn the user into an invincible bear, immune to the arrows of all hunters of the realm combined.

- A medicine so strong, it cures cancer.

Jie Wang's avatar

Love the takes here, I love the example from chess to human mobile manipulation

RL folks usually over-simplify the real world, which actually contains tons a lot research and engineering questions yet to be answered.