I learned a lot from the geometric explanation of duality in Page 9 in [1]: The shortest distance from a point to a convex set is equal to the maximum of the distances from the point to a hyperplane separating the point from the convex set.

[1]. Luenberger, David G. Optimization by vector space methods. John Wiley & Sons, 1997.

I have learned duality from Ryan Tibshirani’s notes with the most ease. I would recommend Oct. 5, Oct. 10 and Oct. 26 lecture slides of Fall 2016 course (without the details on linear programming) : https://www.stat.cmu.edu/~ryantibs/convexopt-F16/ A side benefit for me was KKT conditions looked trivial to me after this.

Hey ben I have a bit of feedback on you post. You also need to specify F is linear or concave-convex for the last equality to hold; I believe Neumann just showed it for linear F with X and Y equal to probability Simplicies. Also, IMO this post kinda glosses over Neumann's motivation. In the games he was thinking of players can only play discrete actions, i.e., corners of the probability simplex, but then randomization allows them to play any point in the simplex. The post doesn't really make this clear.

I think I've tried a dozen ways of explaining duality to myself, still not satisfied. My question here: does this replace a mystery with an enigma? Now I want to know why on earth the two things are equal when X is convex and compact and Y is convex. Of course, I want to know what the dual problem IS. Usually you can wrap your mind around the primal, not so much the dual. In the optimization context, Dasgupta has a nice way of describing it for the linear program, where the dual problem is a way of taking linear combinations of the constraints to come up with feasible but not necessarily optimal solutions to the primal problem.

I learned a lot from the geometric explanation of duality in Page 9 in [1]: The shortest distance from a point to a convex set is equal to the maximum of the distances from the point to a hyperplane separating the point from the convex set.

[1]. Luenberger, David G. Optimization by vector space methods. John Wiley & Sons, 1997.

I have learned duality from Ryan Tibshirani’s notes with the most ease. I would recommend Oct. 5, Oct. 10 and Oct. 26 lecture slides of Fall 2016 course (without the details on linear programming) : https://www.stat.cmu.edu/~ryantibs/convexopt-F16/ A side benefit for me was KKT conditions looked trivial to me after this.

Hey ben I have a bit of feedback on you post. You also need to specify F is linear or concave-convex for the last equality to hold; I believe Neumann just showed it for linear F with X and Y equal to probability Simplicies. Also, IMO this post kinda glosses over Neumann's motivation. In the games he was thinking of players can only play discrete actions, i.e., corners of the probability simplex, but then randomization allows them to play any point in the simplex. The post doesn't really make this clear.

I think I've tried a dozen ways of explaining duality to myself, still not satisfied. My question here: does this replace a mystery with an enigma? Now I want to know why on earth the two things are equal when X is convex and compact and Y is convex. Of course, I want to know what the dual problem IS. Usually you can wrap your mind around the primal, not so much the dual. In the optimization context, Dasgupta has a nice way of describing it for the linear program, where the dual problem is a way of taking linear combinations of the constraints to come up with feasible but not necessarily optimal solutions to the primal problem.

Always a great and insightful read.