Advanced Simplicity

Building up a theory of feedback from proportional-integral-derivative building blocks

Feb 09, 2026

This is a live blog of Lecture 3 of my graduate seminar “Feedback, Learning, and Adaptation.” A table of contents is here.

A theme I’ve been emphasizing so far in this class is how simple controllers can tame uncertain, complex components. In the feedback amplifier example, we showed how feeding back a signal proportional to the deviation from a reference level could yield stable behavior. This sort of proportional control seems intuitively sensible. The controller pulls the system down when it is above the desired level and pushes it up when it is below that level. The control signal is some factor of how far a system deviates from its desired setpoint.

The earliest control designs implemented precisely such proportional controllers to control beastly machines like steam engines. One issue 19th-century engineers soon discovered is that proportional control systems often don’t settle at the desired steady state of zero error. Studying such systems, James Clerk Maxwell—you may remember him from those equations about electricity—wrote arguably the first modern control theory paper in 1868. “On Governors” describes the mathematical properties of the mechanical control systems of the time and shows that to actually maintain fixed points, some sort of integral control was needed. I discussed this necessity last week when discussing homeostasis. Feeding back the running average of a signal implied that the signal must be zero at steady state.

You can, of course, combine the two types of feedback and get a “proportional-integral” (PI) controller. Any feedback system with an integrator can only converge to fixed points with zero steady-state error, and adding a proportional term often brings the system to steady state more quickly and more robustly.

We could add one more term to this setup, feeding back the derivative of the error signal as well. The derivative gives the control signal a bit of momentum. It determines whether the error is increasing or decreasing and, hence, how the corresponding control action should change in the future. By selectively summing the error signal, its integral, and its derivative, we obtain the PID controller.

PID controllers lie at the core of any contemporary control system. PID often suffices to control massive systems all on its own. Even in the most complex control systems, you will always find PID controllers sitting at the bottom of the architecture. For example, if you have a complex robotic control system, you can use some fancy new learning-control technique to tell the robot to move, which requires sending some sort of electronic message to mechanical actuators. At those actuators, PID controllers ensure that electronic signals are properly translated into joint torques.

One of the weirder things about how we typically teach control theory is that we begin with an abstract notion of controllers, prove things in full generality, and perhaps remind ourselves that a lot can be done with PID control in a lab or homework assignment.

Though it feels a lot less interesting, I always wonder why we don’t start with PID control as the foundation of all feedback systems. That is, we could begin the class by analyzing the simplest system and then use this as a building block for more complex control models later in the semester.

Some classes do this without fanfare, but they are usually not listed as control courses. In Nonlinear Programming (aka Continuous Optimization), we start with gradient descent and show how to build up a complex set of algorithms around it. Gradient descent is literally integral control! The “plant” takes as input a vector and outputs the gradient of some function evaluated at the input. The “controller” integrates the plant outputs and sends in a new vector proportional to the negative of that integral.

In our nonlinear programming classes, especially those that focus on the theory of convex optimization methods, the analysis is also control analysis. We analyze gradient descent by constructing a Lyapunov function. This Lyapunov function could be the function value itself. Or it could be the distance to the optimal solution. We proceed to build more sophisticated methods, like the proximal gradient method and Nesterov’s accelerated method. It turns out that these too are PID controllers, and we analyze their convergence behavior using Lyapunov functions as well. What if you want to see how a method works in the presence of uncertainty in the gradient oracle? We can then discuss stochastic gradient methods. And for nonstochastic plants with more complex dynamics, we can apply techniques from online learning and study online convex optimization. At the heart of all these methods remains gradient descent/integral control, even as we make the plant models and analyses more sophisticated.

It’s funny because our understanding of first-order optimization methods and PID controllers is basically… the same? The analyses were often derived by the same people, just in different contexts. Many control theorists became optimization theorists and vice versa. There is a clean dictionary between topics:1

For more details on these equivalences, check out this post from 2018. It builds on an excellent paper by Bin Hu and Laurent Lessard. And if you want a glimpse of how far you can take PID controllers in feedback systems, you should check out the book Advanced PID Control by Karl Åström and Tore Hägglund.

Given this strong parallel, this lecture will treat elementary optimization and control on the same footing. We’ll apply ideas from stability analysis to PID control and optimization methods. I’ll draw connections between the actual techniques from control theory and optimization theory. And from this, we’ll get a more general sense of how simple, structured components can tame complex uncertainties in feedback loops.

It is odd that you can’t make tables in Substack.

Pushpendre Rastogi

12m

Sorry if this I am being too dense, but I didn't understand this part.

> Gradient descent is literally integral control! The “plant” takes as input a vector and outputs the gradient of some function evaluated at the input. The “controller” integrates the plant outputs and sends in a new vector proportional to the negative of that integral.

Wouldn't a better analogy for integral control be methods with momentum , or nesterov acceleration type methods? If loss equals error, then vanilla gradient descent seems more like the D in PID.

arg min

Discussion about this post

Ready for more?