10 Comments
User's avatar
Johan Ugander's avatar

I like the idea of first and foremost motivating convex optimization via the "it's where local search finds global optima" argument. Adding to the standard tools here, I have on several occasions found use for the not-widely-known Mountain Pass Theorem: under (commonly satisfied) smoothness conditions, if a function has no local _maxima_ then all the extrema are _minima_. As such, for the purposes of optimization, any local minima is the single global minima.

With some whittling, this can probably be turned into engaging PhD-level lecture material. As an example, see (Fleischer, 1964) [1], which gives an analysis showing that Lloyd's algorithm for k-means finds global optima when used to quantize Gaussian noise. Fleisher states and uses a mountain pass-like lemma and says "A proof of this Lemma will be given in a forthcoming paper" Which never appeared! It's likely somebody clued him in to the fact that his stated lemma was a corollary of a known thing.

[1] https://web.stanford.edu/~jugander/rare/Fleischer1964_SufficientConditionsMinimumDistortion.pdf

Expand full comment
Ben Recht's avatar

I've never heard of the Mountain Pass Theorem before! Going to do some reading.

Expand full comment
Maxim Raginsky's avatar

You have to be really into the calculus of variations to know the Mountain Pass Theorem.

Expand full comment
Charlotte LeMay's avatar

This works for the cost function f(x,y)=y, but aren't there types of cost functions which would have local but not global extrema even on convex sets? Something like f(x,y)=cos(x)cos(y)*e^(x^2/10) on the circle of radius 5 for example. Is the circle of radius 5 not considered convex with respect to that cost function? Or are there requirements on cost functions in order for convexity to have the desired property?

Expand full comment
Ben Recht's avatar

I perhaps shouldn't have made this a footnote, but, by adding a single variable, you can always assume the cost function is f(x) = x0, where x0 is the first component of x. To see this, suppose you start with a problem

minimize f(x)

subject to x in C

This is equivalent to the problem as

minimize x0

subject to f(x) <= x0 and x in C

So the cartoons are actually more informative than they seem! In your example, f is not convex, and hence if we were to add a dummy variable the set

{ (x0, x) : f(x) <= x0 }

would also not be convex.

Does that explanation make sense?

Expand full comment
Charlotte LeMay's avatar

This does make sense, thank you!

Expand full comment
Cagatay Candan's avatar

If I am not wrong, this is the epigraph representation of the problem. https://math.stackexchange.com/questions/2218797/epigraph-form-of-an-optimization-problem

Expand full comment
Maxim Raginsky's avatar

The book “Statistical Inference via Convex Optimization” by Juditsky and Nemirovski (available for free download here: https://www2.isye.gatech.edu/~nemirovs/StatOptPUPWeb) is a superb example of both the constructive Logo approach and disciplined convex programming. What I particularly like about it is the viewpoint that you give up the ability to have a clean analytical expression for the risk of a given procedure, but you gain a lot from using the Logo approach in formalizing the model and the disciplined convex programming to find and certify the solution.

Expand full comment
rif a saurous's avatar

I'm still confused what precisely is meant by "the Logo approach." Can you clarify?

Expand full comment
Maxim Raginsky's avatar

The Logo metaphor is all Ben's (see https://www.argmin.net/p/programmable-convexity). It's about constructing arbitrarily complex convex sets from two basic sets (line segments and rays) using a finite number of primitives (Cartesian products, intersections, affine images and preimages).

Expand full comment