Are there always trade offs?
If so, then everything isn't optimization.
Programming Note: I’m traveling this week, so for the first time in a month, the posting schedule will be erratic.
What’s the right way to optimize two objectives? This question has been bugging me for months, and I can’t find any satisfactory answer. In the old days, I’d post this question to Twitter to needle the economists and ML people into giving me a response (i.e., I’d troll). But sadly those days are over, and I’m going to have to try to talk through the question by myself here.
Optimization algorithms are shockingly good at finding the minimum value of a single objective under multiple constraints. A solution is minimal if it satisfies the constraints and if its objective value is less than or equal to the objective value of any other potential solution that also satisfies all of the prescribed constraints. This definition of optimality is straightforward and obvious. But finding optimal solutions is not obvious at all. The general program to find such optimal solutions is among the most important mathematical breakthroughs of the 20th century.
With mathematical optimization, we can now find the cheapest plan for almost any project. “Cheapest” usually means “costs the least amount of money.” This is why most optimization texts love to lead with economic examples. The earliest linear program is the “diet problem” where we want to minimize our shopping bill but still get all the recommended macro and micronutrients. I was just reading Birge and Louveaux’s stochastic programming book, and they lead with the example of a farmer trying to maximize profits.
But what if we can’t evaluate a “cost” as a single objective? What if we want a diet that tastes good? What if a farmer doesn’t want to commodify his time into dollars? Do asking questions like this make me an apostate of Mathematical Programming?
The most common answer I’ve found just leans on more economics. When you have multiple objectives and are unclear how to weigh them, you consider the set of solutions corresponding to every possible weighting of the objectives. Since there are an infinite number of such weightings, you get a complex surface of potential solutions. Here’s a picture of what this might look like with two objectives (stolen from Boyd and Vandenberghe’s instant classic Convex Optimization):
This surface of potential solutions is called the Pareto Frontier of the problem. For each solution on the Pareto frontier, at least one of the objectives must increase in value if you change the solution a little bit. Each of these points is called “Pareto Optimal.” You can’t perturb a Pareto optimal point without increasing one of the objectives. Pareto Optimality is one of the earliest conceptions of an “equilibrium” in economics.
But which Pareto Optimal point should we choose? As the picture shows, there will be an infinite number of solutions, and each will have varied objective values. Which one is the right one for your problem?
There are a couple of cases where I know the answer. First, there is a notion of “jointly optimal.” It’s quite possible that there’s some point on the Pareto frontier which is nearly optimal for each objective. If you care about two objectives, and there's a point that's almost optimal for both, that's what you should use. It’s rare, but this can happen!
Second, we use optimization in machine learning and statistics to estimate models that explain data. The objective function adds up a bunch of cost functions to balance fitting a model to data, choosing a model with low complexity, and conforming to priors about what the model should look like. How do we know what the best combination of functions is? Almost always, we use some sort of validation data to confirm that the model makes good predictions. But in this case, we’re finding the parameters that minimize validation error. For the sake of a publication, it doesn’t matter what the training error is if the validation error is small. We are dishonest with ourselves. We didn’t actually care about the objective function for the model at all. The conceit of machine learning and much of applied statistics is that we only care about the problem of minimizing validation error.
In the two examples here, the multicriterion problem was a single criterion problem in disguise. We knew what optimal meant, and it was unambiguous. I would love to know of more examples where consensus is clear in multicriterion optimization, but I think the whole endeavor is question begging. As soon as we can’t agree on a cost function, it’s not clear what our optimization machinery is buys us. Multi-objective optimization necessarily means there is a trade-off. And we can’t optimize a trade-off.
I’m giving a talk about outcome optimization in health care later this week. I’ll write more about this topic here once I’ve gathered all of my thoughts, but I’m becoming ever more convinced that optimization alone cannot neatly solve many problems. It’s unclear if the “optimization mindset” is helping or hurting when trade-offs exist. And there are always trade-offs.
I have spent most of my academic career exploring optimization theory and application, and I love the community. The mathematical and algorithmic content is fascinating and broadly applicable. But the optimization mindset can be harmful. What if, contra my friend Stephen Boyd, not everything is optimization? Then what do we do?
Thanks for reading arg min substack! Subscribe for free so we can explore post-optimzation together.