Feedback on feedback

Looking for simple examples of the power and danger of feedback

Nov 17, 2023

Programming Note: There will be no classes next week for Thanksgiving Break. The following week, we’ll close out the semester with a week full of controversy. So no posts next week unless I am inspired to blog about music or food.

I love control theory, and some of my best friends are control theorists. So know I say this with love: the central concepts and contributions of control are encrypted. Whenever someone asks, “What’s a good book to learn control theory?” I heave a deep sigh. As the famous control adage goes, “There Are None.”

Why do most control courses quickly land in esoterica? Either a class starts with frequency domain concepts, which require the prerequisite knowledge of Fourier analysis, or it starts with state-space controllability or observability, which are both academic, second-order concerns. The core concept of control theory is feedback, its power, and its danger. Can we teach feedback without forcing people to learn Fourier transforms first?

Based on my experience, it’s not easy. We can give some simple demos of driving a car with one’s eyes open or closed, but this doesn’t help someone think about how to build a feedback system. Are there elementary formal things we can do as well? Can we explain the power of feedback control to computer science undergraduates who learn algorithms in discrete time spaces with finite time horizons?

Let me attempt a concise example of what I have in mind. I cooked this one up with help from Francesco Borrelli. I really like cereal and want to make sure I have some available in the morning at all times. My cupboard can only hold two boxes at once. If I have no boxes, I panic and have a terrible morning. But if I have three boxes, I can only keep the box on the counter where my cat rips it to shreds to spite me. Based on my cereal consumption this week, how can I determine whether I should buy cereal at the grocery store? Clearly, the first thing any sensible person would do would be to pose this as a policy optimization problem.

Abstractly, this is an inventory control problem. Let x be the amount of product in the inventory, w the amount of product consumed during, and u the amount of product purchased. We’ll assume u and w are both binary-valued. We can summarize the dynamics with the model

\(x_{k+1} = x_k + u_k - w_k\)

Define the cost function, c(x), to equal 0 if x is equal to 1 or 2, and equal infinity otherwise.

\(c(x) = \begin{cases} 0 & \text{if}~x\in \{1,2\}\\ \infty & \text{otherwise} \end{cases}\)

This cost is probably too extreme to make the problem realistic, but it simplifies the rest of the exposition.

Putting everything together, we have a two-stage policy optimization problem

\(\begin{array}{ll} \text{minimize} & \mathbb{E}_w[ c(x_1) + c(x_2)]\\ \text{subject to} & x_{k+1} = x_k + u_k - w_k\\ & u_k \in \{0,1\}\\ & x_0 = 2 \end{array}\)

How do we solve it? I want to compare two possibilities. A closed-loop strategy uses the current value of x to choose the next action. An open-loop strategy picks a sequence of two actions in advance and proceeds without looking at x.

The obvious closed-loop strategy here is to purchase cereal whenever there is room to store it. In math, set u to be 1 if x equals 1 and u to 0 if x is 2. In this case, x is bumped up to 2, and a disturbance in the next round can at worst push x back down to 1. A simple common sense plan gets 0 cost when you are allowed to use feedback.

But there are no good open-loop strategies. If you decide in advance on your u, some sequence of w’s always results in infinite cost. If the plan is to buy nothing over two steps (u=(0,0)), then w being 1 at both steps results in infinite cost. If the plan ever does make a purchase (u=(0,1), (1,0), or (1,1)), then the w being 0 at both steps results in infinite cost. As long as it’s possible but not certain that w=(1,1), there is no open loop strategy with finite expected value.

A closed-loop policy lets you compensate for the unknown values of w by dealing with them as you go. Moreover, the closed-loop policy didn’t need to know the distribution of the w. Feedback worked for any possible distribution of the w. Without even trying, we found a distributionally robust policy.

Another nice property of this example is it works for Markov Decision Processes too. I could have done everything with tables of conditional probabilities if you prefer that sort of notation.

In any event, I’m sure there are more compelling examples of the power of feedback. If you have a favorite, please share it in the comments. I want to compile these examples into a short tutorial. And maybe that could turn into a class. And with some momentum, this could spiral into the nonexistent book on controls everyone is looking for. Feedback can be powerful, after all.

FourierBot

but Fourier transform is too cool to not learn

Expand full comment

1 reply by Ben Recht

Johan Ugander

I didn’t use it as a student, but Feedback Systems from Murray and Åstrôm tries to be pretty accessible. If you’re going to forego Fourier analysis, I think you need to have at least taken differential equations, and maybe part of the problem is that CS majors in many places tend not to require that…? Thus to cultural home of control being EE/ME…? In my experience, super smart CS folks are surprisingly often allergic to continuous time.

10 more comments...

arg min

Discussion about this post