This is the live blog of Lecture 6 of my graduate class “Convex Optimization.” A Table of Contents is here.
For Boyd and Vandenberghe, optimization is a form of declarative programming. We express a mathematical logic for a mapping rather than a procedure. This leads to an idiosyncratic approach to programming, hinging on mathematical modeling and algebraic manipulation.
Let me explain this a bit through a sequence of examples. A canonical first problem introduced in an optimization class is the diet problem. The goal is to find a diet that satisfies all of the recommended daily allowances of nutrients from the USDA. I won’t belabor it here, but for any set of foods, the allotments that get you your daily allowances are described by a set of linear inequalities. Hence, to get a diet, potential diet, I can write the optimization problem
minimize 0
subject to the diet hits my daily allowances
There will be an infinite number of diets that meet your daily allowances, but if you pass this program to a solver, it will consistently return a single one. It will return a single diet, not the set of all diets that match the constraints. You have to understand the guts of the solver to know which one you will find. But maybe you don’t care and just want the solver to give you an answer.
If you don’t want to hand this ambiguity to the solver, you can narrow down the options by specifying an objective to rank all of your diets. Following the canonical example, you pick the diet of minimal cost. Specifying the prices of food and your desire to be cheap declares to the solver which diet you want it to compute for you.
In the idealized diet example, a modeler comes to the table with a clean mathematical model of their constraints and a clean argument about preferred solutions. The solver is then nothing more than a sorting machine. It effectively lists all possible solutions and extracts the minimum along some axis.
But optimization is far more powerful than this. It’s possible to write problems as optimizations where the feasible set has no apparent connection to the problem you’re trying to solve. Here’s the canonical weird example. Let’s say I have a list of roads, the points they connect, and their lengths, and I want to find the shortest route from A to B. I can do this with the following model. I’ll make a vector x where each component is an endpoint in my grid of roads. I’ll make an array of distances len, where len[u,v] is equal to the length of the road if there’s a road between u and v and equal to infinity otherwise. Then, I can solve the optimization problem:
maximize x[B]
subject to x[A] = 0
and x[u] - x[v] <= len[u,v]
Feeding this problem into a linear programming solver will return some vector. The nonzero indices in this vector correspond to endpoints in the shortest path. This vector doesn’t even give you the shortest path! While the B component will hold the length of the shortest path from A to B, you need to access something called the dual solution to retrieve the shortest path (Thanks to Matt Hoffman for spotting the error which I’ve left here in strikethrough).
This formulation is decidedly different from the diet problem. Your average modeler would never have figured out how to write the problem this way. A feasible solution to this problem doesn’t necessarily have anything to do with a valid path from A to B. For example, the vector of all zeros is feasible. The only interesting feasible point here is the optimal solution. And yet, once we’ve written the problem this way, we can feed an array of road lengths into a linear programming solver. It will compute the right solution. We have specified a function that maps distances to paths by specifying an optimization problem.
Optimization gives us a complex and versatile modeling language for functions that specifies functional mappings. We’ll see countless other examples of this in the class. In machine learning, the input is a collection of patterns, and the output is a prediction function. In portfolio optimization, the input is a model of asset prices and their correlations, and the output is an investment strategy. By writing functions this way, we can avoid thinking about algorithmic particulars and worrying about proving correctness. If our model is correct, then our function is correct. This simplifies the problem of verification. It’s like SQL but for policy.
Now, I’m belaboring these modeling choices today because the goal of Boyd and Vandenberghe is to give you as wide a modeling toolkit as possible. How can you turn an arbitrary problem into a specification of convex constraints and costs? Much of Chapter 4 focuses on how you can manipulate models to turn them into convex problems. One of my favorite lines from the chapter is:
“We call two problems equivalent if from a solution of one, a solution of the other is readily found, and vice versa. (It is possible, but complicated, to give a formal definition of equivalence.)”
This definition of equivalence is impossibly broad. It tells us to devote time to worrying about equivalence of formulations, operations that preserve optimality, and tricks like Ford’s formulation of shortest paths. If you can screw your head around enough algebra, a variety of complicated problems can be specified as an optimization pipeline: ETL fed into a convex solver followed by simple postprocessing. The goal of the class is to sketch out just how much you can do with that deceptively simple pipeline.
"The nonzero indices in this vector correspond to endpoints in the shortest path."
I'm not sure this is true?
Suppose we have four nodes (labeled 1, 2, 3, 4) and four edges arranged in a diamond, so that the edges are
1 <=> 2, length 1
2 <=> 4, length 1
1 <=> 3, length 1
3 <=> 4, length 2.
We want to go from 1 to 4, and the shortest path is clearly 1 => 2 => 4, with length 2.
I think a valid solution for x is
x1 = 0
x2 = 1
x3 = 1
x4 = 2
But node 3 is not on the shortest path, even though x3 > 0. In fact, I don't see any way to set x3 = 0 without violating a constraint.