3 Comments
User's avatar
Maxim Raginsky's avatar

Your example of the two systems is a good opportunity to bring up an important issue of the sign of the control coefficient. Here, you are assuming that we know that the coefficient of u[t] is positive. What if we don't know the sign? If it happens to be negative, then using your negative state feedback rule u[t] = -x[t] will destabilize both systems. In fact, in the adaptive control literature Steve Morse conjectured in 1983 that not knowing the sign of the control coefficient will doom any attempt to stabilize the system even if you know the magnitude of the control gain. It was disproved, first by Roger Nussbaum using heaps of fancy math and then by Jan Willems and Chris Byrnes using more down-to-earth methods. It turns out that you can still stabilize the system even if you don't know the sign in front of u[t]. In fact, RL heads will recognize their idea as a combination of the doubling trick and alternating exploration and exploitation (switch between adaptive schemes assuming + or -, run each scheme for twice as long as the time before). One interesting conceptual outcome of all that work is that you need dynamic controllers -- i.e., the controller is itself a state-space model with the system state as input and the control signal as output, pure memoryless state feedback is not sufficient.

Fernando Palafox's avatar

I was working on some model-based RL recently and found that the autoregressive trajectories on some of my learned systems were completely wrong in certain parts of the state/action space but the model was still very useful for control. E.g., when autoregressively rolling out trajectories on a (high-reward) learned model of a quadruped, it moved forward even with ZERO control input. This is likely because most of the training data had it going forward.

This reminds me of Nathan Lambert's work on objective mismatch in model-based RL. Basically, it seems like learning a model that is accurate "everywhere" in state-space is not necessary for good control. In hindsight, this sounds obvious, but I have a control-theoretic background and this was not what I initially tried optimizing for: i was always looking for a globally accurate model of the environment.

Avik De's avatar

Some historical context that other readers might enjoy - Norbert wiener, in Cybernetics (1948), described this spectrum between what he called compensators and feedback. Compensators can respond faster than pure feedback controllers because they have a model of the system (vs. waiting for error accumulation), but of course need more system knowledge. While feedback was known before (governor, as covered in a past article of this series), to my knowledge, Wiener is regarded to have formalized feedback control. The cybernetics book is fascinating, especially the connections to biology. I wrote a short takeaways post as well https://www.avikde.me/p/what-wiener-knew-about-artificial but recommend learning more about him for anyone curious.