Discussion about this post

User's avatar
Pushpendre Rastogi's avatar

Sorry if this I am being too dense, but I didn't understand this part.

> Gradient descent is literally integral control! The “plant” takes as input a vector and outputs the gradient of some function evaluated at the input. The “controller” integrates the plant outputs and sends in a new vector proportional to the negative of that integral.

Wouldn't a better analogy for integral control be methods with momentum , or nesterov acceleration type methods? If loss equals error, then vanilla gradient descent seems more like the D in PID.

No posts

Ready for more?