This is a live blog of Lecture 3 of my graduate class “Convex Optimization.” A Table of Contents is here.
A minor quibble with my good friends Stephen and Lieven, but I already have a disagreement with lecture order. Next time I teach this class, I’d put today’s lecture first and Tuesday’s lecture second. Because today, I want to motivate why we care about convex sets.
Convex optimization, roughly speaking, is the set of all problems where naive local search always finds global solutions. Let’s say we want to minimize some cost function subject to some constraints. We start at a point that satisfies the constraints but maybe doesn’t have an optimal value. We look for a direction that (a) stays inside the set and (b) makes the function we want to minimize smaller. We then move along that direction while making sure we don’t go too far so that the constraints are violated.
Repeating this process of direction finding and solution updating, one of two things will happen. We could keep finding directions of improvement forever because our cost function is unbounded below (e.g., minimizing a linear function in one variable) or because the cost keeps improving but we never hit a boundary (e.g., minimizing the exponential function in one variable). If we don’t get stuck in an infinite loop, we will get to a place where we can’t find any reasonable improvement anymore. Does this mean we’re at an optimum?
In general, we have no idea about where this naive local search will take us. A simple picture illustrates the issue:
Here, the shaded region is the feasible solution and I’m assuming we’re just trying to minimize the function f(x,y)=y. We just want to find the point closest to the bottom of the graph.1 If we start at the big black dot, there’s no way to make any local progress, but we can see far better solutions to this problem. This is, it turns out, only the tip of the iceberg of pathologies you encounter in non-convex land.
But now look at this beautiful convex cartoon.
The only place where local search can terminate is precisely at the star. Local search has no choice but to find a global solution. We can prove this rigorously too. Suppose we have another point in the set with a lower y-value. Then along the line connecting the star to this point, the y-value decreases the entire way. Since the set is convex, the line connecting the star and our new candidate solution has to stay in our set. But that means there was a direction from the star that would have lowered our cost. This contradicts the star being a local solution.
The proofs that local solutions are globally optimal for convex optimization are deceptively simple. The minimal assumption you need to seems to be convexity. I think about these proofs as assumed into the model. Convexity is forced upon us if we insist local search is valid.
One thing to keep in mind is that you can have a ton of globally optimal solutions. Here’s an example where there is an infinite collection of solutions at the bottom.
All of the solutions on the bold line are globally minimal solutions. We can’t guarantee a unique global solution. We only guarantee that local search through the set only terminates at one of the global minimizers.
Now, there’s one extra ingredient that forces our hand. We need to have some way of checking whether or not a direction will stay in the set. At a bare minimum, we must have an efficient way of checking if we’re in the convex set. This motivates what Stephen and Lieven call “disciplined convex programming.” The Logo rules in the class will ensure that the sets we construct are easy to search over.
Even more amazingly, the Logo rules of convexity let us prove that our solutions are local. If I’ve written some optimization code that is stuck at some solution and can’t find a way to improve, how can I know if it’s my problem or if the code has found a legitimate local solution? Maybe I’m just bad at finding search directions. In general, proving a point is a local minimum is ridiculously hard. I don’t put too much stock into P vs NP mumbo jumbo, but people can construct all sorts of pathological examples where there’s a nice direction downhill, but no one has a practical way of finding it.
In disciplined convex land, there’s often an algorithmic path to verifying local (and hence global) optimality. Define two sets: First, let Sd be the set of all directions that make the cost function smaller. In our cartoons, these are all pairs (x,y) with y less than zero. Second, let Sf be the set of all feasible directions, meaning all of the directions that stay inside the convex set. If I can prove these sets are disjoint, then I will have proven there are no directions that decrease the cost and stay inside the set. That is: I will have proven I have found an optimal solution.
One way to prove the sets are disjoint is to find a function h such that h(x,y) is negative for all (x,y) in Sd and nonnegative for (x,y) in Sf. In disciplined convex programming, not only can we efficiently find such functions h, but we can always find affine functions. Geometrically, these affine functions define two halfspaces, one containing Sd and one containing Sf. This search for separating hyperplanes to prove optimality is what we call duality in convex programming. Tomorrow, I’ll dig more into this core relationship between hyperplanes, convex sets, and optimization.
By adding one variable to the problem, I can always assume we’re just trying to minimize the linear function cost(x) = x0. Here’s the silly trick: to minimize a function f(x), I can always find the minimum value of t subject to (x,t) subject to f(x) <= t and x lying in the original constraint set.
I like the idea of first and foremost motivating convex optimization via the "it's where local search finds global optima" argument. Adding to the standard tools here, I have on several occasions found use for the not-widely-known Mountain Pass Theorem: under (commonly satisfied) smoothness conditions, if a function has no local _maxima_ then all the extrema are _minima_. As such, for the purposes of optimization, any local minima is the single global minima.
With some whittling, this can probably be turned into engaging PhD-level lecture material. As an example, see (Fleischer, 1964) [1], which gives an analysis showing that Lloyd's algorithm for k-means finds global optima when used to quantize Gaussian noise. Fleisher states and uses a mountain pass-like lemma and says "A proof of this Lemma will be given in a forthcoming paper" Which never appeared! It's likely somebody clued him in to the fact that his stated lemma was a corollary of a known thing.
[1] https://web.stanford.edu/~jugander/rare/Fleischer1964_SufficientConditionsMinimumDistortion.pdf
This works for the cost function f(x,y)=y, but aren't there types of cost functions which would have local but not global extrema even on convex sets? Something like f(x,y)=cos(x)cos(y)*e^(x^2/10) on the circle of radius 5 for example. Is the circle of radius 5 not considered convex with respect to that cost function? Or are there requirements on cost functions in order for convexity to have the desired property?