In the penultimate week of classes, we turn to decision making where our current actions impact future decisions. These two weeks get a bit ridiculous because this topic could comprise a full graduate school curriculum. Should this lecture be about linear feedback systems, stochastic programming, robust optimization, model predictive control, dynamic programming, reinforcement learning, or combinatorial search? Uh, it sort of needs to be about all of them. But I see the end of the course as a bridge to what’s next. Let’s take a couple of weeks to dive into a little bit of all of these topics and see why we have too much more to learn.

Even if we don’t have uncertainty, there are interesting optimization problems that involve sequential decision making with feedback. Perhaps the simplest to explain are board games like checkers, chess, or Go. In each of these games, the reward is 1 if you win and -1 if you lose. You choose a move at every time step with the goal of winning, and then your opponent counters with their own counter to defeat you. We can set this up as a min-max optimization problem over a sequence of many moves, where at each stage a player chooses a move that optimally takes them closer to the end game. For checkers, chess, and go, this optimization problem is entirely deterministic. It is only hard because computing the optimal solution appears to be completely intractable. There is something like 10^120 games of chess with 40 moves. Navigating this space efficiently to find good strategies is computationally daunting.

Now what happens when we add uncertainty to the picture? Now, problems that are trivial when compared to chess become a challenge as we attempt to mitigate the impact of the unknown. Rather than having a maximally clever adversary, what if we just don’t know what the future brings with each move? Take a simple example in supply-chain management. A shopkeeper wants a certain amount of cheerios delivered each week. If they order too much, they won’t have room to stock it, but too little and they can’t meet their hungry customer’s demand. In this case, the dynamical model is very simple, but how we model and act upon the uncertain demand introduces a new layer of complexity.

The most ubiquitous decision systems don’t really plan at all. Everywhere around us, thousands of tiny control loops are running at all times to keep our home and working environments consistently well-regulated. These control loops exist to reject disturbances and often do so with no models at all. For instance, a thermostat has no model of the room it's in. It turns on the heater when it’s too cold and turns it off when it’s too hot. Most of the control loops in our lives work in a similar fashion, implementing only the most rudimentary modeling and planning. But what makes designing these controllers challenging are questions of robustness and reliability, given the limited amount of modeling we can do.

Part of why machine learning courses on decision making get confused is we tend to lump all of these different decision systems together. We probably need to think about them separately. No complex systems are built by optimal control alone or by PID alone. We could build local theories of feedback, planning, and stochastic programming, and stitch them together with rigorous design principles. How to do this decomposition then becomes a question of *architecture*.

John Doyle has been preaching this view for decades. I’m finally getting around to understanding why.

The architectural components are evident when you look at how a complex robot works. At the top level, perhaps there’s a path planner that moves the robot through the trajectory. This path planner is verified by some safety controller to make sure none of the instantaneous moves will cause the system to crash. Then, the system passes actuation signals to various motors, each running a local controller to manage the translation from digital message to actual torque.

The internet is a masterpiece of engineering architecture. An extreme diversity of endpoints can participate in an extreme diversity of networked applications. To do this, information is transmitted through a series of layers: the physical transmission of the bits themselves, the transmission between nodes, the transmission through the network, all the way up to the application of streaming TikToks. This complex system lets us send unfathomable amounts of information to each other.

Modern machine learning loves to assert that abstraction boundaries are useless and that end-to-end strategies will beat out modularity. But their artifacts rest on the most complex, modular architectures ever created: The internet. Without the internet, there’s no GitHub or datasets. And without these, there’s no end-to-end deep learning.

Architectures don’t explicitly solve stochastic optimization problems that govern the entire system. Perhaps at the highest level, when we do some planning, stochastic optimization plays a role. But how do we think about the guarantees needed to make the whole system robust? How do we understand abstraction boundaries and how to connect layers and levels of pieces together? I don’t have the capacity to teach this in our machine learning class. I can only do my best to make you aware of the problem. But maybe we’ll exclusively cover architecture next year.

I remember John saying something like this at CDC 2012 in Hawaii, over drinks before the banquet:

“That first sentence of Anna Karenina — ‘Happy families are all alike; every unhappy family is unhappy in its own way.’ I finally realized that it’s a statement about system architecture!”

Like the idea of architecture a lot!

One example of "architecture" that comes to mind, different from the examples you give, is the structure of weakly coupled MDPs that comes up in operations research, more commonly used in old-school approximate DP. To quote from Brown&Zhang who have some recent results on tightness, weakly coupled structure arises where "a set of linking constraints in each period couple an otherwise independent collection of subproblems." (the paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3849573)

Typical relaxations appeal to Lagrangian relaxation of the linking constraint, to then focus on local optimization within subproblems. This ends up being a nice model for local planning with global resource constraints.

Interesting to think about in relation to broader notions of architecture in the control of complex systems!