This is the live blog of Lecture 15 of my graduate class “Convex Optimization.” A Table of Contents is here.
We’ve hit the middle of the semester, and so I need to find my bearings. The class has a midterm on Monday, and today’s lecture is a midterm review. Next Thursday, we’ll have a “Fall Break.” I’ll probably still blog a bit next week as I have an inflammatory draft on science and statistics that I need to get out of my inbox. Class reconvenes on the following Tuesday, moving into the third part of the semester: a survey of algorithms.
Since today’s class is a midterm review, let me use this blog to jot down my reflections on the class content so far.
Declarative Programming
Boyd and Vandenberghe’s declarative programming introduction to convex optimization solidly holds up 20 years later. This focus yields a streamlined introduction to convex analysis, which is a great way to introduce this subject. And I’ve seen few better ways to make modeling a first class citizen in an optimization class. Modeling should be a first class citizen! As much as optimizers like to write papers where we’re just handed abstract minimization problems out of the ether, modeling is the most important part of this class for most people.
I’m still not sure what the right balance of theory and application is for this class. An alternative approach, and one which I might do at the undergraduate level, would be to first teach linear programming, then cover unconstrained convex programming, and finish with convex programming with nonlinear convex objective functions and linear constraints. If I had done it this way, I’d have skipped a lot of the convex analysis and separating hyperplane theory, but I'd still have been able to introduce the KKT conditions and basic duality. At the midsemester break, I’m wondering if this would be a better way to teach the material at the graduate level too.
Solve stuff early
In retrospect, my biggest pedagogical oversight so far has been avoiding programming assignments. I should have introduced solving optimization problems in the homework once we hit Chapter 4. I think I had a blindspot because I’m used to teaching algorithmic-centric optimization classes, where the students have to program solvers by hand. However, the whole point of the declarative approach to convex optimization is that you don’t have to care about the algorithm (until you run into scaling issues). For small, simple problems, everyone can just run cvxpy in a Jupyter Notebook. Cvxpy doesn’t require you to understand what the solver is doing. I think this is obvious, but I’m making a note to my future self to introduce cvxpy earlier next time I do this class.
A farewell to Semidefinite Programming
I had forgotten how much of the book leans on semidefinite programming for examples. Given all of the other material to cover, teaching people about Schur complement tricks and nonlinear matrix manipulation felt like too much. The extra emphasis on semidefinite programming in the text is a reflection of its times, as everyone loved SDPs in the early 2000s. I mostly skipped these problems because it's simply too much to cover in a semester, and sadly, SDPs are no longer trendy. Ensuing technology trends pulled us away from interior point methods and SDPs and towards streaming algorithms we could run to place ads.
Is everything quadratic cost and linear constraints?
When you skip semidefinite programming, most of the remaining examples are linear and quadratic optimization. At the beginning of the semester, Jeremy Kun commented that he thought all you’d need to learn is linear programming and unconstrained nonlinear programming. Midway through the class, I think he’s about half right: you could probably get away with mostly linear programming with quadratic objective functions. If you wanted to get extra fancy, we saw a few examples with second-order-cone constraints. That sums up the majority of the application examples.
So much statistics
The book’s notable nonlinear, nonquadratic examples revolve around entropy and its Fenchel conjugate. This highlights how statistics has been one of the major consumers of convex optimization. As I’ve said, this makes sense because if you’re already working with probability distributions, then convex combinations have to play a central role.
The applications covered in the book are also heavily focused on statistics. Two of the three application chapters “could” be called statistics, depending on how much of a statistical imperialist you are. Even in Chapter 8, there’s a long section on pattern classification pitched as a “geometric problem.
So little control
By contrast, a notably downplayed application in Boyd and Vandenberghe is control systems. They talk a little bit about control in the introduction and have a short section on linear control in Chapter 10 (we’ll get there). This is particularly odd given that Boyd and Vandenberghe were both trained as control theorists. I’ll have to ask them why they left this out.
A chapter on simple applications in control, for instance, based on Boyd’s optimal control class, would have been relevant given the current academic interest in using machine learning in control systems. Simple examples from optimal control, such as the linear quadratic regulator or simple Markov decision processes, would be very relevant. I think I’m going to do a week on this topic at the end of the semester.
Just to note, Boyd and his friends have this book on control and system theory applications of linear matrix inequalities (LMI). This book seems to be focused only on SDP aspects of the course; and also requires much more hand holding than the convex optimization book unless you already know what Popov, Lyapunov, Kalman did in control area. (You may not appreciate some examples/sections without a certain background.) In spite of that it is also reader friendly and very well organized: https://web.stanford.edu/~boyd/lmibook/