This post reflects on the live blog of my graduate class “Convex Optimization.” A Table of Contents is here.
As the semester draws to a close and the Thanksgiving creep envelopes November, I’m reflecting on a semester of convex optimization. Teaching this class has been long overdue for me, and I’ve been thrilled to engage with the material again. Boyd and Vandenberghe’s book holds up remarkably well after twenty years, and the subject remains one of the crown jewels of engineering mathematics. The mathematics is beautiful and tells us useful things about practice. If you can pose a problem as a disciplined convex optimization problem, you can solve it efficiently. That’s incredible, and we shouldn’t take it for granted. To have a course where the theory, algorithms, and practice are so closely aligned is a rarity.
Since the material is fairly noncontroversial, I found myself blogging about it a lot less than I expected. In my machine learning live blog last fall, I got into the habit of two-phase blogging. I’d first write what I thought I’d say in lecture and then return the next day to scribe all the conceptual issues that got in the way of the discussion. A semester course on machine learning and prediction is much trickier than one on optimization. It is either a week where you tell people about the holdout method and sklearn, or 15 weeks of confusion because we don’t and can’t understand inductive logic or probability. My class was definitely the latter.
Convex optimization was decidedly different. I left almost every class satisfied that I had said what I wanted to say. This led to less blog volume. Whether this is good or bad is a matter of personal reader opinion.1 Where I failed in my aspirations was minimizing the mathematical notation in these blog posts. Convex Optimization is a math class! I didn’t have that many equations, but I wanted fewer. Perhaps I can get there in a future iteration. But trying to engage critically, fluently, and succinctly with graduate material pushed me to typeset a lot of formulas. I’d need to write about it outside the context of a fast-moving semester class to explain optimization theory without math. In a forthcoming book, I made such an attempt.
I registered a few reflections at the midpoint, and these points still stand. The material I’ve taught since—algorithms and optimal control—was solid, and I have no reservations about how this material. Parts I (Theory) and III (Algorithms) hold up remarkably well. I think they remain the right way to teach this material. I may have a few quibbles with the algorithms section. I think it’s good that they have a self-contained, straightforward introduction to self-concordance, but I would never teach it. I think the Primal-Dual Interior Point method should be more emphasized than the barrier method. But these are my preferences, and the book works as a solid backbone for both the elements of convex analysis and the basis of optimization algorithms.
But I was surprised how much I disliked teaching the middle section on applications! When I first read this book 20 years ago, I loved the applications section. Every page connected to something I’d seen in a class or a paper, now grounded in a universal optimization framework. But when teaching, I had a different perspective. Many of the students hadn’t seen these applications before. I had to explain the philosophy of the applications to a class with remarkably diverse backgrounds. Do the application examples make sense outside of the context of the disciplines? That is, without econ training, do the economic examples make sense? Without some statistics background, do the stats examples make sense? Without some control background, does control make sense? Given that I would only have one or two lectures for each of these topics, I felt like I was teaching a new superficial story every other week. I maybe didn’t need a whole semester for the applications, but I’d have liked more than a week.
The big challenge for my next iteration of this class is rethinking the applications section. I might try focusing on a single domain. Parth Nobel wrote to me about a short course on convex optimization he coteaches at NASA. Here, the applications all focus on spacecraft. I would characterize Borrelli, Bemporad, and Morari's book on model predictive control as half a semester on convex optimization followed by half a semester on MPC.
Using Boyd and Vandenberghe as is, I could teach a version of chapters 6 and 7 as applications in statistics, machine learning, or signal processing. Trying to teach all three might get too conceptually muddled, but I see a path to weave them together. Perhaps I could have the students vote at the beginning of the class for which version of the applications module they’d be most interested in learning.
I look forward to your feedback on these live blogs. If you are a student in the class, I’d love to get your general impressions of the lectures too. After a decade of not teaching the material, it was great to get back to it. I’ll say it again: convex optimization is one of the most impressive achievements of 20th-century applied mathematics. Don’t get me wrong, the resulting technology is not perfect, as numerical errors, ill-conditioning, and bad specification can ruin your day. And there is certainly no promise that your disciplined convex mathematical model is an accurate reflection of reality. But nothing is perfect. Because this course manufactures a strong abstraction boundary between the mathematical model and that which is modeled, we can throw out the concerns of how all models are wrong and get a glimpse of what could happen if your model was right.
yes, I know I write too much.
I like the way that Nocedal and Wright introduce the interior point method. They seem to focus more on the primal-dual version. Also, just in general, I find their book more cohesive than Convex Optimization because it builds a story instead of just collecting sets of facts and examples. Both are important references!
I can definitely relate to what you are writing about covering examples from applications, given that 2024-2025 academic year is my optimization year: I am about to wrap up teaching a graduate class based on Luenberger's _Optimization by Vector Space Methods_ and switch over to teaching optimization to undergrads using the book you wrote with Steve Wright. Most of Luenberger's book holds up remarkably well including the examples, although I also make sure to cover a lot of the "current thing" kind of stuff (RKHS, optimal transport, statistical inference using optimization à la Juditsky-Nemirovski). The main difficulty is exactly as you outline: Being able to convey enough of the philosophy of each application domain (economics, optimal control, statistics, ML) to ground the problem. I suspect it will be easier with the undergrad course since the emphasis is mostly on applications in ML and data analysis.