*This is the live blog of Lecture 9 of my graduate class “Convex Optimization.” A Table of Contents is here.*

Duality theory in optimization is deep, beautiful, and mysterious. For every minimization problem, we can construct a *convex* maximization problem whose optimal value is less than or equal to the optimal value of the minimization problem. The minimization problem is called *the primal problem* because it’s the one we primarily care about. The maximization problem is called *the dual problem,* and it helps us reason about the primal problem.

Lower bounds are useful: if you have a point that you think is optimal for the primal problem and a potential solution for the dual problem with the same objective value, you have found a solution for *both* the primal problem and the dual problem. In fact, we’ll show that any feasible point for the minimization problem will have an objective value larger than any feasible point for the new maximization problem. And even if your primal problem is nonconvex, the dual problem is always convex, so it can provide insights into what might be achievable with your intractable primal problem.

Duality theory also gives us insights into the robustness of solutions to specification and the sensitivity of different modeling assumptions. It inspires algorithmic strategies for solving both convex and nonconvex problems. It’s a powerful theoretical and applied tool and is essential to understand if you want to be a practicing optimizer.

And yet, I’ve been struggling all morning to figure out how to introduce the topic without it seeming magical. There are different paths to introduce it, and all of them feel weird and confusing. Boyd and Vandenberge jump right in with a Lagrangian function, but where did Lagrange come up with those ideas of Lagrange multipliers? It takes some work to motivate this! Some people start with abstract convex geometry, relating graphs of optimization problems and their characterization in terms of separating hyperplanes. I love this derivation, but it takes me an hour of confusion to remember how to explain it (apologies to Dimitri Bertsekas). The final introduction, which parallels the historical origin of optimization duality, is through minimax theory. But, again, minimax theorems are also mysterious.

Well, let me try to go through the minimax theorem because it did really start this whole thing rolling. I’ll introduce it the same way von Neumann discovered it: in terms of a game. The game has two players, Player One and Player Two. Player One and Player Two have a known joint function F(x,y). Player One wants to choose x from a set X to make F as large as possible. Player Two wants to choose y from a set Y to make F as small as possible. At the end of the game, Player 1 gets F(x,y) points, and Player Two gets -F(x,y) points. Does it matter who plays first?

If Player One goes first and plays x1, then Player Two will minimize with respect to their available choices. That is, Player Two chooses y to achieve a payout

Therefore, to anticipate this move by Player Two, Player One should pick the x that maximizes this infimum. Player One’s best strategy yields a final payout of

From the same analysis, if Player Two chooses first, then the final score is

Should Player One opt to go first or second? How does a maximum of a minimum compare to a minimum of a maximum? The following nearly tautological analysis shows that Player One should always choose to go second.

Take *any* function F(x,y) and *any* sets X and Y. Then for a fixed x0 and y0,

Now, this means that if you take the supremum of both sides, this inequality is valid

Since this holds for all values of y0, we have

That is, minimums of maximums are always greater than maximums of minimums.

In 1928, John von Neumann realized that they were often equal. In this game, suppose the players choose their moves *at random*. Even if players declare their strategies in advance (each player declares their sampling distribution), *there is no advantage to the order*. In math, if X and Y are both probability simplexes, von Neumann proved that

If players play these random strategies, they can even tell their opponent in advance what their strategy is and still be optimal.

Von Neumann derived this in studying parlor games, but its impact has been felt far beyond game theory. In fact, if X is a convex compact set and Y is convex, then the minimax theorem is still true. The min max equals the max min.

What does this have to do with optimization? In class today, I’ll show how to write minimization problems as a min max. The dual problem will then be the associated problem when we switch the order of minimum and maximum. Like I said, it’s mysterious. I’ll write back tomorrow to report on how it went.

I learned a lot from the geometric explanation of duality in Page 9 in [1]: The shortest distance from a point to a convex set is equal to the maximum of the distances from the point to a hyperplane separating the point from the convex set.

[1]. Luenberger, David G. Optimization by vector space methods. John Wiley & Sons, 1997.

I have learned duality from Ryan Tibshirani’s notes with the most ease. I would recommend Oct. 5, Oct. 10 and Oct. 26 lecture slides of Fall 2016 course (without the details on linear programming) : https://www.stat.cmu.edu/~ryantibs/convexopt-F16/ A side benefit for me was KKT conditions looked trivial to me after this.