Programming Note: I’m traveling this week, so for the first time in a month, the posting schedule will be erratic.
What’s the right way to optimize two objectives? This question has been bugging me for months, and I can’t find any satisfactory answer. In the old days, I’d post this question to Twitter to needle the economists and ML people into giving me a response (i.e., I’d troll). But sadly those days are over, and I’m going to have to try to talk through the question by myself here.
Optimization algorithms are shockingly good at finding the minimum value of a single objective under multiple constraints. A solution is minimal if it satisfies the constraints and if its objective value is less than or equal to the objective value of any other potential solution that also satisfies all of the prescribed constraints. This definition of optimality is straightforward and obvious. But finding optimal solutions is not obvious at all. The general program to find such optimal solutions is among the most important mathematical breakthroughs of the 20th century.
With mathematical optimization, we can now find the cheapest plan for almost any project. “Cheapest” usually means “costs the least amount of money.” This is why most optimization texts love to lead with economic examples. The earliest linear program is the “diet problem” where we want to minimize our shopping bill but still get all the recommended macro and micronutrients. I was just reading Birge and Louveaux’s stochastic programming book, and they lead with the example of a farmer trying to maximize profits.
But what if we can’t evaluate a “cost” as a single objective? What if we want a diet that tastes good? What if a farmer doesn’t want to commodify his time into dollars? Do asking questions like this make me an apostate of Mathematical Programming?
The most common answer I’ve found just leans on more economics. When you have multiple objectives and are unclear how to weigh them, you consider the set of solutions corresponding to every possible weighting of the objectives. Since there are an infinite number of such weightings, you get a complex surface of potential solutions. Here’s a picture of what this might look like with two objectives (stolen from Boyd and Vandenberghe’s instant classic Convex Optimization):
This surface of potential solutions is called the Pareto Frontier of the problem. For each solution on the Pareto frontier, at least one of the objectives must increase in value if you change the solution a little bit. Each of these points is called “Pareto Optimal.” You can’t perturb a Pareto optimal point without increasing one of the objectives. Pareto Optimality is one of the earliest conceptions of an “equilibrium” in economics.
But which Pareto Optimal point should we choose? As the picture shows, there will be an infinite number of solutions, and each will have varied objective values. Which one is the right one for your problem?
There are a couple of cases where I know the answer. First, there is a notion of “jointly optimal.” It’s quite possible that there’s some point on the Pareto frontier which is nearly optimal for each objective. If you care about two objectives, and there's a point that's almost optimal for both, that's what you should use. It’s rare, but this can happen!
Second, we use optimization in machine learning and statistics to estimate models that explain data. The objective function adds up a bunch of cost functions to balance fitting a model to data, choosing a model with low complexity, and conforming to priors about what the model should look like. How do we know what the best combination of functions is? Almost always, we use some sort of validation data to confirm that the model makes good predictions. But in this case, we’re finding the parameters that minimize validation error. For the sake of a publication, it doesn’t matter what the training error is if the validation error is small. We are dishonest with ourselves. We didn’t actually care about the objective function for the model at all. The conceit of machine learning and much of applied statistics is that we only care about the problem of minimizing validation error.
In the two examples here, the multicriterion problem was a single criterion problem in disguise. We knew what optimal meant, and it was unambiguous. I would love to know of more examples where consensus is clear in multicriterion optimization, but I think the whole endeavor is question begging. As soon as we can’t agree on a cost function, it’s not clear what our optimization machinery is buys us. Multi-objective optimization necessarily means there is a trade-off. And we can’t optimize a trade-off.
I’m giving a talk about outcome optimization in health care later this week. I’ll write more about this topic here once I’ve gathered all of my thoughts, but I’m becoming ever more convinced that optimization alone cannot neatly solve many problems. It’s unclear if the “optimization mindset” is helping or hurting when trade-offs exist. And there are always trade-offs.
I have spent most of my academic career exploring optimization theory and application, and I love the community. The mathematical and algorithmic content is fascinating and broadly applicable. But the optimization mindset can be harmful. What if, contra my friend Stephen Boyd, not everything is optimization? Then what do we do?
I have a defense for the "everything is optimization" viewpoint, possibly based on a misinterpretation of the phrase. My specific claim is the following: every decision is optimal for some objective (probably many!). So the real question is *which* objectives are our decisions optimizing, and how does that compare with what we actually want? In this perspective, optimization is a language for making sense of choosing between options. And as long as we want to go on making choices or taking actions, we can't get away from optimization!
All this to say, perhaps the problem arises when optimization is taken to be prescriptive, rather than descriptive. And it is natural to understand optimization as being prescriptive.
Hi Ben - long time lurker, big fan of the substack. As usual, I largely agree with your thinking here. I'm writing to share with you an instance of a trade-off in my own work and how it's -- slowly, over time -- been getting "resolved". If nothing else, I hope at least the psychology of it all may be interesting :)
In our work on balancing covariates in randomized experiments, we found a fundamental trade-off in choosing a randomization scheme: it is impossible to achieve maximal robustness (i.e. assignments are independent) and maximal covariate balance (i.e. the treatment / control groups look similar). On the one hand, you may want independent assignments to achieve a min-max design when you aren't sure of what the outcomes may be. On the other hand, you may want to balance covariates to improve precision when the outcomes are "related to" the covariates. We proposed a design that allows experimenters to trade-off these two competing objectives. However, we concluded that there is no "right" point to choose in the trade-off -- we don't know the outcomes before we run the experiment, so we pretty much have to take an educated gamble. This is how I thought about our result for a long time.
Fast forward a few years and we developed asymptotic analyses (hear me out) at the request of reviewers. An asymptotic analysis requires a few regularity conditions, which are honestly pretty mild all things considered. But under these regularity conditions, a funny thing happened: the trade-off disappeared! It turned out that there was an "optimal" way to trade-off robustness and covariate balance which ensured that we achieve maximal robustness and maximal covariate balance in the limit. This was surprising to us!
Why am I writing this? Not because I do or don't believe in asymptotic analyses (well...) but rather, to share this instance where a fundamental trade-off over a Pareto frontier turned out to have an "optimal" solution once we view the problem through a "limiting" lens. I wonder what sorts of other "limiting lenses" we can use to obtain a meaningful resolution to Pareto Problems.