This is the live blog of Lecture 13 of my graduate class “Convex Optimization.” A Table of Contents is here.
Chapter 7 of Boyd and Vandenberghe is entirely devoted to applications in statistics. If you are a statistical imperialist, you could argue that Chapter 6, which covered inverse problems and function fitting, was all statistics too. But Chapter 7 is the really pure stuff. There are five sections: parametric parameter estimation, nonparametric distribution estimation, hypothesis testing, large deviation inequalities, and experiment design. If you’ve ever worked with statistics, it’s fascinating to work through these and see how all of the bread and butter methods in statistics are convex optimization problems in disguise.
On the one hand, this shouldn’t be too surprising. You can view any convex combination as an expected value. So proving stuff about optimal convex combinations could potentially give us insight into optimal statistical methods. Optimization methods give us metrics to select which method is better. But in data analysis, it’s not always clear what better means!
John Tukey warned against this sort of fixation with optimization in his 1962 essay “The Future of Data Analysis.” Tukey saw that the field of Statistics defined itself through optimization. What is the most efficient test? What procedure maximizes the likelihood? What is the tightest bound on the probability of an event? Methods are compared and contrasted through various metrics, and even though every statistician knows the metrics are made up, the academic discourse settles around methods that beat the metrics anyway. As Tukey says, “Danger only comes from mathematical optimizing when the results are taken too seriously.” But in statistics, we take them very seriously, demanding that people learn proper methods to discover new science, approve medications, and design governmental policies.
By taking optimization too seriously, Statistics (with a very capital S) becomes trapped by its tooling. I was reminded of Tukey’s warning during a fascinating panel hosted yesterday by Berkeley’s statistics department. Deborah Mayo gave the statistics colloquium, and she suggested her talk be accompanied by a panel discussion with Berkeley faculty. (For what it’s worth, I found this to be a brilliant format for a colloquium. Definitely a model worth repeating moving forward!) I was a discussant, along with statistics faculty Philip Stark and Bin Yu and philosopher Snow Zhang. I should set aside some time to blog about the discussion but have to stick to my lecture live blogging commitment today. As a compromise, let me raise a point about optimization that touches on a theme raised yesterday.
Mayo consistently points out in her writing that the choice of statistical method is almost always a philosophical question. What exactly we are testing rests on a variety of beliefs about the nature of reality and potential counterfactual explanations of observation. As Philip Stark pithily puts it, statistics is computational epistemology. Through a century of obsession with optimization, statistics has generated a lot of mathematics and rigor trying to answer questions that can’t be rigorous.
If you do enough statistics, you’ll see that the models we use are the ones we can solve. And once we can solve those models, we force the world to look like that. If we know that we can solve the maximum likelihood sparse covariance estimation problem or whatever, then we go about trying to convince ourselves that all data generating processes are gaussian distributions with sparse covariance matrices. More insidiously, we mindlessly assume all data is independent or exchangeable. These are examples of what I mean by trapping ourselves with our tooling. Why are we using that mixed-effect linear model with robust standard error? Because Stata has a package to find the maximum likelihood estimate and tell us the p-value.
Weirdly, most of the parts we can solve are the convex optimization problems. For instance, suppose we want to design a medical test that has a false positive rate of at most 5% when a person is healthy and a maximal true positive rate when a person is sick. By asking for a maximum, I just stated a design problem as an optimization problem. There is a beautiful proof that such a maximally accurate test not only exists but is easily computed from distributional information about the test. This is called the Neyman Pearson Lemma, and, as we’ll see in class today, our modern understanding of the lemma rests on convex duality theory. But since we now know this fact about “most powerful tests,” we turn all statistical testing into a problem of computing likelihoods and obsess ourselves with the ornate details of power calculations based on specious “null hypotheses.”
As I’ve emphasized in these lecture blogs, we too often reason about the problem we want to solve in terms of the optimization problems we know how to solve. Statistics gives us endless examples of this sort of thinking. We find the solvable statistical models and convince ourselves they apply to our problem. We can always compute the effect difference, tell stories about what the variance of this difference should be under various implausible conditions, and get our papers published and drugs approved when the difference divided by the hypothetical error has magnitude greater than 2.
Corelations and stories become authoritative when you dress them up in technical modeling language like we do in this class. As a mathematical optimization theorist, I’m one of the baddies! But I hope this chapter of Boyd and Vandenberghe shows us why statisticians propose the things they do. The models they can solve are almost always the convex optimization problems. Once we have convex hammers, we see everything as convex nails. There are many nails out there in the world, but I hope that we can heed Tukey and remember these hammers shouldn’t be taken too seriously.
Nice! I really, really liked this blogpost!
So many great points in this essay. It's interesting to think about your arguments from an econometrics perspective. Model choice in econometrics is guided by economic theory and concerns about causal inference, and econometricians constantly wrestle with issues like endogeneity, model misspecification, & valid instruments to ensure meaningful causal insights, not just statistical significance. The DGP needs to reflect realistic economic assumptions (e.g., market frictions, equilibrium, non-stationary processes, etc.). While both disciplines rely heavily on solvable methods, the econometric approach is more constrained by underlying theory and/or hypothesis, and as i wrote the other day, that often leads to tossing some real neat/efficient mathematics to more sleight of hand techniques.