Discussion about this post

User's avatar
Maxim Raginsky's avatar

Your example of the two systems is a good opportunity to bring up an important issue of the sign of the control coefficient. Here, you are assuming that we know that the coefficient of u[t] is positive. What if we don't know the sign? If it happens to be negative, then using your negative state feedback rule u[t] = -x[t] will destabilize both systems. In fact, in the adaptive control literature Steve Morse conjectured in 1983 that not knowing the sign of the control coefficient will doom any attempt to stabilize the system even if you know the magnitude of the control gain. It was disproved, first by Roger Nussbaum using heaps of fancy math and then by Jan Willems and Chris Byrnes using more down-to-earth methods. It turns out that you can still stabilize the system even if you don't know the sign in front of u[t]. In fact, RL heads will recognize their idea as a combination of the doubling trick and alternating exploration and exploitation (switch between adaptive schemes assuming + or -, run each scheme for twice as long as the time before). One interesting conceptual outcome of all that work is that you need dynamic controllers -- i.e., the controller is itself a state-space model with the system state as input and the control signal as output, pure memoryless state feedback is not sufficient.

Fernando Palafox's avatar

I was working on some model-based RL recently and found that the autoregressive trajectories on some of my learned systems were completely wrong in certain parts of the state/action space but the model was still very useful for control. E.g., when autoregressively rolling out trajectories on a (high-reward) learned model of a quadruped, it moved forward even with ZERO control input. This is likely because most of the training data had it going forward.

This reminds me of Nathan Lambert's work on objective mismatch in model-based RL. Basically, it seems like learning a model that is accurate "everywhere" in state-space is not necessary for good control. In hindsight, this sounds obvious, but I have a control-theoretic background and this was not what I initially tried optimizing for: i was always looking for a globally accurate model of the environment.

1 more comment...

No posts

Ready for more?