20 Comments

Thank you for letting us know the old and interesting story, very thoughtful article.

Expand full comment

Do you by any chance have a reference for the opening anecdote? I’d be very interested to read more.

Expand full comment

Shameless plug, but I just finished a book on the history and philosophy of automated decision making where I discuss this anecdote in depth.

I'll post a lot more about the book once I get an OK from the publisher. Stay tuned!

Expand full comment

Sounds interesting, I’ll keep an eye out for it!

Expand full comment

Stephen Boyd gave a recent interview on the InControl podcast, where he tells how he learned from a circuit designer that the advantage of convexity has nothing to do with provable, global optimality. Rather, the circuit designer explained that the benefits of convex optimization are "smart initialization doesn't matter" and "no babysitting". Notice that the inference-time forward-pass of an LLM has these same convenient properties! More here: https://calvinmccarter.substack.com/p/stephen-boyd-on-convex-optimization

Expand full comment

Are you going to cover the online case with multiple rounds when both the cost and constraint functions are unknown at the beginning of every round?

Expand full comment

With some hesitation due to my lack of full knowledge in this detailed topic, I would like to suggest you to look at piecewise linear, continuous and convex approximation of a given function from R to R in Section 6.5.5. of Boyd’s text. I believe this may establish a good connection with neural networks having RELU units. (I have tried to study similar connections a little; but, the extension for function approximation from R^N to R seems to be difficult.)

Looking forward to new blogs and lecture notes from you Ben :) Have a great semester!

Expand full comment

I will dare and come forward with a coda to Von Neumann's starement:

"If you have an application that satisfies the axioms, use it. If it does not, then *determine what parts do and use it as a subroutine* before moving on."

Expand full comment

I have asked around for concrete applications of convex-but-not-merely-linear programming. Everyone in practical settings has told me some version of "I think it could be successful but linear programming works well enough that the impetus to try convex solvers is not big enough" or they point me to a variety of research papers. When this happens with enough people (say, everyone I talked to in Googles operations research group, and more), it suggests to me that there are few critical problems for which convex methods are used. So my question to you for this course is: where are convex methods actually used in industry?

Expand full comment

Depends on which industry and what you mean by nonlinear! But certainly quadratic programs are used all over the place. I'll highlight examples of those and others as the course goes on.

Expand full comment

Every time someone has told me they use quadratic programs they secretly linearized behind the scenes. /shrug

Expand full comment

You might want to take a look at https://scicomp.stackexchange.com/q/15926

Expand full comment

Thanks! Doesn't give too much hope that much has changed in the last 10 years, though. 🙃

Expand full comment

Nonlinear problems are extremely common in industry. From my own experience, quadratic programming is widely used for both model predictive control (which is itself widely used in, for example, process control and robotics) and for portfolio optimization (which is the bread and butter of certain approaches to quant finance). A portfolio optimization problem with a hard upper bound on portfolio variance will take the form of a second-order cone problem, which is also widely used in practice. In the industrial circles I move in the above are extremely common, whereas linear problems (with the exception of epigraphical formulations of the above) are rare, so I think it just depends on the kind of work you do.

Expand full comment

How about statistical model fitting, like Poisson or (multinomial) logistic regression?

Expand full comment

Don't people just use gradient descent to find the maximum likelihood? Seems to bypass most or all of the convex optimization theory.

Expand full comment

For unconstrained problems, the fact that you can "just use gradient descent" and expect it to work without much fuss is one of the big benefits of the theory (knowing what's convex and what is not). For example, the fact that MNL is convex vs. some other approach has never been obvious to me without the theory. We could also debate whether gradient descent itself counts as convex optimization theory :)

For constrained problems, theory can help with algorithm selection (fast first-order methods, for example), and modeling approach (monotonicity may be easy to model because it is convex, for example, while other properties not so much).

Knowing that some problem formulations *should* be relatively easy to solve can help you iterate faster because you can focus on the model instead of the algorithm to solve it.

Expand full comment

That is all well and good, but unrelated to my question, which is specifically about the use of convex solvers and convex programming formulations, which is what is being advertised here and which I cannot corroborate with evidence of use in production systems. Saying "they use gradient descent and it's even better for x,y,z reasons" is only more evidence against the claim that convex solvers are used.

Expand full comment

Is that a Wu Tang reference in the last paragraph?

Expand full comment

how to feel AGI if it isn't real?

Expand full comment