### The Policy of Truth

Our first generic candidate for solving reinforcement learning is Policy Gradient. I find it shocking that Policy Gradient wasn’t ruled out as a bad idea in 1993. Policy gradient is seductive as it apparently lets...### A Game of Chance to You to Him Is One of Real Skill

This is the fifth part of “An Outsider’s Tour of Reinforcement Learning.” Part 6 is here. Part 4 is here. Part 1 is here. The first two parts of this series highlighted two parallel aspirations...### The Linear Quadratic Regulator

This is the fourth part of “An Outsider’s Tour of Reinforcement Learning.” Part 5 is here. Part 3 is here. Part 1 is here. What would be a dead simple baseline for understanding optimal control...### The Linearization Principle

This is the third part of “An Outsider’s Tour of Reinforcement Learning.” Part 4 is here. Part 2 is here. Part 1 is here. I have an ethos for tackling problems in machine learning that...### Total Control

This is the second part of “An Outsider’s Tour of Reinforcement Learning.” Part 3 is here. Part 1 is here. In addition to the reasons I’ve discussed so far, I’ve been fascinated with the resurgence...### Make It Happen

This is the first part of “An Outsider’s Tour of Reinforcement Learning.” Part 2 is here. If you read hacker news, you’d think that deep reinforcement learning can be used to solve any problem. Deep...### Lessons from Optics, The Other Deep Learning

Would you say deep learning is mature enough to be taught in high schools? Here’s why I ask. Some time ago, I received an email from a product manager at a very large company. I...### Directions of Ascent

Last November was a dramatic wake-up call to many of us in information technology, and I’ve spent a large part of the last year learning about how I and others in similar positions can help...### An Addendum to Alchemy

This post is an addendum to our “test of time” talk at NIPS 2017. We’d like to expand on a few points about the talk we gave at NIPS last week. The talk highlighted the...
Newer