A Tribute to Dimitri Bertsekas

Jun 23, 2026

My teacher and friend Dimitri Bertsekas passed away earlier this month. I just learned about this over the weekend, and I’m still processing my thoughts.

Dimitri was a hero of mine for so many reasons. I took my first and only class on optimization with him, and I was riveted by his clean presentation of convex analysis and mastery of the overhead projector. He took a liking to me, and I made it a point to find him for a chat whenever I’d visit MIT. He was always generous with his time and excited to exchange ideas, but he wouldn’t hesitate to harshly scold me when I’d present some result new to me that he had in fact written about twenty years earlier.

This was unavoidable because Dimitri did foundational work across mathematical optimization, penning landmark results in stochastic gradient descent, convex optimization, distributed optimization, dynamic programming, and reinforcement learning.

His work in reinforcement learning remains underrated. His collaboration with John Tsitsiklis, compiled in their book Neurodynamic Programming, was the first to show that most reinforcement learning algorithms were effectively approximating dynamic programming. Our contemporary, model-free mindset, rooted in Markov decision processes, derives from these initial insights. As far as actual practice goes, their book is more important to the modern way we implement reinforcement learning than Sutton and Barto’s.

Dimitri was also a passionate mathematical communicator. I own more of his books than any other mathematics researcher, but he wrote more than any other mathematics researcher in my field. Is there too much material? You could make the case! However, that he has definitive texts covering a broad range of optimization theory is remarkable.

In many ways, Dimitri was an original math blogger. He wrote exactly what he wanted to write, and he wrote frequently. He got fed up with publishers getting in the way of his process, so he started his own publishing company, Athena Scientific, shipping stacks of books out of his garage in Belmont, Massachusetts. This allowed him to write countless new editions and revisions of his work, reflecting trends in practice and incorporating new insights and simplified arguments. Though you’ll see plenty of repetition across his volumes, he knew that no book was a final draft. Each volume was a step towards broader understanding.

A decade ago, when he retired from MIT, I wrote a post of appreciation for his scholarly and pedagogical work. I’m going to reprint it here today, as it includes one of my favorite passages in his books about the tension between theory and practice. Right now, in our frenzied agentic era, we’re leaning heavily on only practice and vulgar empiricism to push out as many papers and products as we can before the bubble pops. Is there a role for theory at all anymore?

Dimitri argued that optimization theory is always a mix of qualitative and quantitative. The qualitative helps us understand what we know and don’t know. By gathering feedback from practice and constructing a working narrative, theorists can help engineers develop a language to describe what is possible and what can be improved. Theory provides a narrative scaffolding that helps us understand what to build and what to attend to when things break. Building stories of connections helps us streamline our processes and try things we might not have thought of.

There’s an ebb and flow between the theory and practice. I will revisit this ten years from now and see where the theories land.

Until then, rest in peace, Dimitri. Know you changed the way I think.

The following initially appeared as The Role of Convergence Analysis on June 10, 2016 at the old argmin blog.

This year marks the retirement of Dimitri Bertsekas from MIT. Dimitri is an idol of mine, having literally written the book on every facet of optimization. His seminal works on distributed optimization, dynamic programming, and Lagrangian methods remain the best references available. I had the privilege of taking Dimitri’s convex analysis course in grad school, and he would frequently burst into class beaming because he had stayed up until 2AM the night before simplifying an argument of Rockafellar’s down to elementary calculus.

My last post on Lagrangians was based on Chapter 3 of Dimitri’s Nonlinear Programming Book. Chapter 2 also happens to feature one of my favorite passages about the delicate balance between theory and practice in optimization. One of the trickiest parts about optimization (and a point I intend to repeatedly hammer on this blog) is realizing how many of the theorems are “qualitative” rather than “quantitative.” I wanted to just quote Dimitri’s text in full here, as I don’t think I could write it better. Best wishes to you in retirement!

The Role of Convergence Analysis by Dimitris Bertsekas

The following subsection gives a number of mathematical propositions relating to the convergence properties of gradient methods. The meaning of these propositions is usually quite intuitive but their statement often requires complicated mathematical assumptions. Furthermore, their proof often involves tedious ϵ−δ arguments, so at first sight students may wonder whether “we really have to go through all this.”

When Euclid was faced with a similar question from King Ptolemy of Alexandria, he replied that “there is no royal road to geometry.” In our case, however, the answer is not so simple because we are not dealing with a pure subject such as geometry that may be developed without regard for its practical application. In the eyes of most people, the value of an analysis or algorithm in nonlinear programming is judged primarily by its practical impact in solving various types of problems. It is therefore important to give some thought to the interface between convergence analysis and its practical application. To this end it is useful to consider two extreme viewpoints; most workers in the field find themselves somewhere between the two.

In the first viewpoint, convergence analysis is considered primarily a mathematical subject. The properties of an algorithm are quantified to the extent possible through mathematical statements. General and broadly applicable assertions, and simple and elegant proofs are at a premium here. The rationale is that simple statements and proofs are more readily understood, and general statements apply not only to the problems at hand but also to other problems that are likely to appear in the future. On the negative side, one may remark that simplicity is not always compatible with relevance, and broad applicability is often achieved through assumptions that are hard to verify or appreciate.

The second viewpoint largely rejects the role of mathematical analysis. The rationale here is that the validity and the properties of an algorithm for a given class of problems must be verified through practical experimentation anyway, so if an algorithm looks promising on intuitive grounds, why bother with a convergence analysis. Furthermore, there are a number of important practical questions that are hard to address analytically, such as roundoff error, multiple local minima, and a variety of finite termination and approximation issues. The main criticism of this viewpoint is that mathematical analysis often reveals (and explains) fundamental flaws of algorithms that experimentation may miss. These flaws often point the way to better algorithms or modified algorithms that are tailored to the type of practical problem at hand. Similarly, analysis may be more effective than experimentation in delineating the types of problems for which particular algorithms are well-suited.

Our own mathematical approach is tempered by practical concerns, but we note that the balance between theory and practice in nonlinear programming is particularly delicate, subjective, and problem dependent.

arg min

Discussion about this post

Ready for more?