This is the third part of the live blog of Lecture 11 of my graduate class “Convex Optimization.” A Table of Contents is here.
A common theme in the comments and offline discussions of inverse problems has been how to deal with nonconvexity. I realized that for the application section of the class, it might be worth having a few complementary posts airing the limits of convexity. There’s a lot of upside to sticking with convex formulations, but there’s a lot we currently don’t know how to do with convex optimization. Perhaps this is because it’s impossible, but perhaps it’s because no one has figured it out yet. You can consider these “limits of convexity” posts to be free course project ideas.
I could do a whole series on the limits of convexity in inverse problems, and it definitely needs more attention than a single blog post to do this real justice. But I like artificial constraints on my blogging, so I’m going to do my best within my live lecture blogging model. Consider the following to be stream of consciousness scattered thoughts.1 Maybe we can use the comment section as a supplement. If you have good references for state-of-the-art solutions to inverse problems and how people handle nonconvexity, throw them there.
As a reminder, there are many upsides to posing your inverse problem as a convex optimization problem of the sort I described last week. The declarative convex programming approach lets you modularly combine a wide set of modeling primitives to describe your inverse problem. You would have dozens of solvers to choose from, specialized to particularly common patterns. These solvers would always return a global minimum, so debugging your model doesn’t rely on evaluating the optimization quality. There are decades of analyses that give insights into the solution quality you should expect given what you know about your processes, whether they be the forward model, the noise process, or the state description.
But there are definitely inverse problems that I don’t know how to pose as convex optimization problems. Sometimes, you don’t have a clean convex model of state plausibility. If you know that your state can be decomposed into a small combination of simple states, then you can apply the techniques from last week. But what if you don’t know how to characterize the simple states? What if you’d like to estimate the best set of states from the data as you solve the inverse problem? This problem is called dictionary learning and, to my knowledge, is always done using some sort of nonconvex optimization. The most famous example of dictionary learning is principal component analysis, which requires solving a singular value decomposition. The SVD is one of the few nonconvex problems we can provably, reliably, and efficiently solve.
Sometimes, your forward model isn’t well approximated by a linear model. For example, if your measurement is a convolution of a state with an unknown noise signal, the inverse problem, called a blind deconvolution problem, is not convex. And some forward models are just not well approximated by linear maps. A notably popular model that computer vision people have been into for the past few years, called radiance fields, is a light absorption model that generalizes the forward model in computational tomography. People use radiance fields for all sorts of inverse problems that map a collection of 2D photos into 3D (and higher dimensional) models. Though this model is a very crude approximation of light scattering, it makes compelling pictures. I don’t know of a clean way to linearize it.
When you hit these nonconvex boundaries (as you always will if you are an intrepid modeler), should you then turn to LLMs? Maybe? This feels like overkill. On the other hand, denoising lies at the heart of diffusion models. I mentioned the denoising problem last week, and it is one of the core primitives of inverse problems: lots of people have observed that if you can remove noise from a signal, you can solve the inverse problem too.2 This is the main idea behind diffusion models that use cascades of denoising algorithms (using complicated neural net gunk) to build robust image models. I don’t hate it!
Such complex models are probably overkill for most applications. If your forward model is nonlinear, you can just try nonlinear optimization. Since all of the problems I posed last week have equality constraints, you can eliminate the constraints to yield an unconstrained optimization problem and apply gradient descent. Gradient-based optimization has proved fast and powerful in radiance fields (e.g., this or this) and many other applications.
Finally, here’s a mindless application of machine learning to inverse problems that can be surprisingly effective. At the end of the day, our goal is to map measurements to states. We assume we know the mapping from state to measurement. This means we can simulate an infinite collection of pairs of plausible states and their associated measurements. I literally mean simulate here. You could call this a “training set” and then build a machine learning model that fits a map from measurement to states on the training set. I have seen this work well in more applications than I can count. In some sense, this is all of machine learning. You might say that all machine learning problems ask for the inverse of the map from hidden state to feature representation.
In inverse problems, we’re even better off than in general machine learning. We strongly believe the forward model is real and hence that pattern recognition is possible. The downside to this pattern recognition approach to inverse problems is you spend a ton of time training the model. However, with a model in hand, computation is super cheap for new inverses. You just apply your neural net to the measurement. In optimization land, every inverse costs the same, requiring solving an optimization problem on each new measurement. Even in convex inverse problems, this application of nonlinear machine learning could prove very useful.
And less lucid than normal as my brain fatigues from this extended heatwave.
See this paper with Mahdi and Samet for some cool insights about convex and nonconvex denoisers, as well as a long reference list.
In the state estimation / robotic perception community (e.g., SLAM, 2-view pose estimation, and more) the Burer-Monteiro method has become a really powerful way of solving certain non-convex inverse problems. We recently tried to write a more accessible overview of this approach: https://arxiv.org/abs/2410.00117
The high-level theme behind these approaches has been: (i) model the problem as a QCQP, (ii) form a semidefinite relaxation of the QCQP via Shor's relaxation, (iii) if the semidefinite relaxation is tight and satisfies a specific condition on the constraints (LICQ) then solve with Burer-Monteiro method. This is a nontrivial set of requirements but has allowed for efficient global optimization (often as fast as SOTA local solvers) for a pretty meaningful set of non-convex problems so far.
In this house, we believe the forward model is real.