an argmin year in review

Dec 29, 2025

I wrote a post in July about navigating the archives here on the blog. In the past, when I’d consecutive posts coalesced around a clear theme, I’d make little pages indexing them together as a series. I didn’t do my archiving as well as I should have this year, so in the spirit of a year-in-review, I wrote a single post to make sense of where the blog has been. When not course blogging, 2025 was the year of bureaucracy and my pet theories of instrumentalized statistics. Let’s take a look at how it all threaded together.

Live blogging

Most of the posts this year were part of or closely related to my courses’ live blogging. I wasn’t planning on committing to this for the Spring semester, but the machine learning evaluation course with Deb Raji was so engaging that it ended up being what I wrote about. [1] I’ve listed those blogs as part of a larger syllabus that I’ll keep linked on the navigation bar of the blog’s website. In the Fall, I live-blogged my graduate machine learning class and tried to keep as much of the material on the course’s table of contents page as possible. [2] Based on the past, I’m guessing I’ll live-blog again this year, too. So stay tuned for the Spring semester when I try to make sense of systems thinking at the intersection of learning, dynamics, and control. If you wanted a resolution of this post, well… maybe we’ll get somewhere? [3] There will be a lot of antimimetic systems thinking, from which I think we’ll all learn something. At some point, I’ll live blog Machine Learning 101, though probably not in 2026. [4]

The Bitter Lesson Revisited

Before my fall class started, I wrote a few posts that I should retrospectively add to the syllabus. Notably, I revisited Turing Laureate Richard Sutton’s famous essay “The Bitter Lesson” and its grotesque cooptation for nihilistic capital expenditure. [5] I also wrote how weird it was that Sutton conflated board games and pattern recognition as the same problem, two problems only linked because they were solved by competitive testing. [6] Computer science can’t shake competitive testing.

Another unfortunate artifact of the bitter lesson is people imbuing magic into larger models and sophisticated architectures, without recognizing that the important part is the data. You’ll see all sorts of wild arguments on how adopting fancy transformer models will revolutionize every discipline. Unsurprisingly, some of the most annoying arguments come from economists. Economists now throw together random Python notebooks, call them AI, and declare revolution. I think they’re just bored. I poured some water on a few of those stories this year, including using (Rahimi-Recht) random features for predicting portfolio returns [7] and why it’s impossible to make positivist claims about predictability. [8]

However, this is not entirely the economists’ fault (for once?). The field of machine learning has always had a soft spot for complex models. People love to show that they can use the latest and greatest model to solve a problem, even if it’s often beaten by linear models. [9] And while I like linear models for a variety of practical reasons, they themselves are not a panacea of parsimony or explainability. [10] Simplicity—something I’m in favor of striving for—is always in the eye of the beholder.

Instrumentalized Statistics

Statistics has many different uses, and each application modifies the meanings of statistical concepts. I wrote three papers describing how probabilities, means, and expected values —always computed with the same formulas—occupy different denotations in the varied realities where they are purposed for sensemaking. One controversial paper framed frequentist statistics as ex ante policymaking. [11] A second described the metrical determinism inherent in Meehl’s Clinical vs. Actuarial Prediction riddle of outcome maximization. [12] The third, a reconstruction of defensive forecasting with Juanky Perdomo, described prediction as a bookkeeping scheme aimed at making erroneous past forecasts look better. [13] Surprisingly, this cynical accounting recovers optimal error rates across a variety of different prediction problems [14]. Defensive forecasting also yields a curious actuarial version of subjective probability aimed at maximizing a forecaster’s external reputation. [15] Defensive forecasting highlights the constructive nature of predictions, in which forecasters provide guesses only about those futures they want to manifest. [16]

Grounding statistical concepts in their application helps us unpack the many ways statistics get bandied around in scientific and public discourse. For example, not all of statistics serves bureaucracy, but hypothesis tests are best interpreted in this way. [17] On the other hand, mandating (bureaucratic) statistical thinking is a fast path to stifling creative research. [18]

Statistical forecasting plays a different role as a modern form of authoritative punditry, casting wish-fulfillment in mathematical formalism. I wrote a lot about the false authority of mathematical language, seeking its roots. Neil Postman was calling out the rhetorical power of mathematical argumentation in the 1980s, and his essay on “Social Science as Moral Theology” is more relevant forty years later in today’s world of chartbois. [19] We accord greater weight to counts than to anecdotes, even when the counting measures are of questionable validity or reliability. [20] This misplaced authority helps explain why we continue to model people as computational agents even though 50 years of empirical evidence has shown this model couldn’t be more wrong. [21]

Football Shamalytics

To illustrate the absurdity of constructed mathematical authority, I dipped my toes in the water of sports analytics. I wrote several posts about how a particular gambit in football, attempting a two-point conversion when down by 8 points, is based on highly questionable statistical modeling and is far less successful than proponents want you to believe. [22, 23] Indeed, the gambit managed to work only once out of ten attempts this season, and that single success was in one of the most ridiculous games of the year between two of the most hapless teams in the league. [24]

I gained a new appreciation for people who make a living arguing about sports. You have to be committed to repeating yourself over and over again and never admitting you’re wrong. I mean, I’m an academic, and that sounds like my job. I didn’t expect football punditry to be so exhausting.

Now, I swear, I wasn’t writing about sports to get a job at ESPN. The go-for-two-down-eight gambit is a microcosm of statistical thinking and the infiltration of statistical risk management into every aspect of our lives. It narrowly shoehorns a complex problem into a simple choice between two actions, always preferencing the one with higher expected value in an unvalidatable model. The “probabilities” in sports are ad hoc mambo jumbo concocted to entice gamblers. [25] Even the most committed sports analytics pundits reject their models in favor of “psychology” when it suits them [26]. But even when the probabilities are “right,” this sort of EV-maxxing decision making is suboptimal in actual game theory. Someone needs to run a seminar on mixed strategies for sports media. The idea that football is effectively like blackjack remains pervasive even though they couldn’t be more different. [27] Politics isn’t blackjack either.

Academic Bureaucracy

It was a terrible year for American universities, under attack from both a spiteful, grievance-driven federal government and the arrogant, intellectual vanguard of Silicon Valley. These two groups had a marriage of convenience this year, and Silicon Valley has been screaming that they can out-science the academy. Except Silicon Valley’s romantic view of science is disengenuously pre-Kuhnian. I wrote about the Thielian “Gold Standard Science” executive order and how good science is a shifting cultural convention constructed by scientists themselves. [28,29]

These emergent cultural constructions for cultivating expertise have metastasized into red tape with the exponential growth of scientific papers. Notions of “validity” serve as cultural guidelines that streamline academic discourse. [30] Commitment to these validity constructs is often overdone, and the current checklist system for programmatizing machine learning research is a particularly ridiculous manifestation of trying to over-constrain scientific communication. [31] Similarly, programmatizing p-values, not questionable research practices, leads to aberrational p-curves. The meta-scientific obsession with what these curves should look like misses the point that they are an artifact of bureaucratic policy, not evidence of fraud. [32] “Peer review” (broadly construed) is the foundation of scientific knowledge-making, [33] but this is not a defense of pre-publication peer review, which grows more absurdly useless by the day. [34]

The arXiv banned position papers, and I wrote about the odd role of the arXiv in scientific communication [35] and how we might rethink our systems for archives, commentaries, and surveys. [36] These accidental systems for scientific standards manifest themselves in curious ways, and pretending they aren’t there only amplifies the goal displacement in scientific pursuit as eloquently captured by Kevin Baker. [37]

The Invention of Computer Science

The origin of the field of computer science serves as an illustrative case study of the nonintentional construction of standards. [38] Computer science arose in the postwar academic boom and thus has all of the distinctive characteristics of our modern system. The original meaning of “science” in computer science, as laid out by thought leaders Louis Fein and George Forsythe, was closer to library and information science than to physics. CS departments were cobbled together in response to market demand for maintaining computing centers and producing knowledge workers. The visions for the scope of study were more akin to the current conceptions of “data science” than what computer science would become for the 70s-00s. [39] But establishing new departments required academic political maneuvering and clever branding exercises. A more academic allure was created by adopting an aura of rigor, notably by retconning the influence of Alan Turing on American engineering. [40] Indeed, Fein’s imagining of the future of computer science was more boring bureaucratic automation than wild science fiction. [41] That’s a harder sell, I suppose.

AI as Bureaucratic Technology

Computer science still juggles science fiction amazement and bureaucratic optimization, but it’s been taken to the extreme after six decades of Moore’s Law scaling. We’re still techlashing against computing machines taking our jobs, but the papers that come out show how they are just rearranging what people do. For example, a study claiming AI improved cancer detection was actually a study of how many radiologists were needed to corroborate a diagnosis. [42] Another random randomized experiment seemed to find that AI was slowing down expert open-source developers, though it wasn’t quite clear what the measures of productivity actually meant. [43]

We were told AI was going to replace mathematicians (notably by one person who has devoted their life to saying this over and over again). So far, we have seen a more modest revolution in which LLMs have dramatically improved our ability to search the mathematical literature. Such advances in library science revealed that mathematicians weren’t particularly good at knowing which problems were open or closed, even within their own subdisciplines [44]. Indeed, in one of my favorite papers of the year, Boris Alexeev and Dustin Mixon—without the help of an LLM—found the solution to a heralded 50-year open problem in a paper that had been written 30 years before the problem had been posed [45]. LLMs have certainly changed mathematical research, but this has been more as Lore Laundering Machines, surfacing esoterica in natural language disguised as insight.

Outside of the research world, 2025 was the year of slop. Tech oligopy’s artificial intelligence marketing campaigns can’t get past the sell of “Ask Jeeves, but 30 years later,” insulting everyone’s actual intelligence. [46] There was a dust-up when a purportedly AI country song became popular on Spotify, but I wrote about commercial country music’s ignoble lean into human-generated slop. [47]

2025 was also the year when people finally got tired of the singularity dweebs. Here’s hoping that sticks in 2026. New tools for software development and prototyping that can also propose bad recipes for date night are amazing, but they are not Skynet. I called out the cultish obsession with “AGI,” writing how it has always been more religious than scientific, but it’s a religion that sits uncomfortably close to power. [48] This proximity to power focuses collective attention on the wrong problems. So “AI Safety” creates a particular, hyperbolic conception of harm, distracting from the ways in which actual chatbot products are already causing widespread harm. [49]

But if we want to shift attention to more important pursuits, it is tricky to thread the nuanced needle of social critique. Jumping between clever populist sound bites that contradict each other isn’t healthy for the discourse. [50] I still think the best push is to situate this new technology in the broader history of automation. But I also think critique needs to point towards concrete actions. The development of smaller, fully open models [51] and Jessica Dai’s proposals for aggregating public feedback for better evaluating AI models [52] are directions with potentially huge impact.

Live blogging

[1] Machine Learning Evaluation - Spring 2025

[2] Patterns, Predictions, and Actions (revisited) - Fall 2025

[3] Frame by Frame

[4] Statistical Fatalism

The Bitter Lesson revisited

[5] The Negroni Variation

[6] All our games turn into Calvinball

[7] You keep using that word

[8] The unpredictability conundrum

[9] Selecting for complexity

[10] Inference From the Best Prediction?

Instrumentalized Statistics

[11] Bureaucratic Statistics. Full paper on (arXiv)

[12] The Actuary’s Final Word Full paper on (arXiv)

[13] In Defense of Defensive Forecasting. Full paper on (arXiv)

[14] Restatements or Forecasts?

[15] Probability Is Only A Game

[16] One out of five AI researchers

[17] How do you know so much about swallows?

[18] Metascience of pull requests

[19] Correlations and Stories

[20] Pretending not to count

[21] Mathematical Pluralism

Football analytics

[22] Changing the Meta

[23] Stop going for 2 down 8

[24] Learning from Losers

[25] What is the chance of a Beast Quake?

[26] Sunday Never Knows

[27] Don’t Be Resulting

Academic Bureaucracy

[28] This is fine

[29] The Good, The Bad, and The Science

[30] Strunk and White for Science

[31] Standard error of what now?

[32] Milton Friedman’s p-values

[33] A Defense of Peer Review

[34] The Open Marketplace of Ideas

[35] A position on positions

[36] DOI Directorate.

[37] Measures as ends

The invention of computer science

[38] Computer science is what computer scientists do

[39] Computational Mythmaking

[40] Physics for Synnoets

[41] May you live in boring times

AI as bureaucratic technology

[42] Are radiologists finally out of a job?

[43] Are developers finally out of a job?

[44] Lore Laundering Machines.

[45] The fine art of crate digging

[46] Henry was Right.

[47] Acceptable I-V-vi-IV Songs

[48] Maybe just believing in AGI makes AGI exist.

[49] The Banal Evil of AI Safety

[50] How Snake Oil Becomes Normal Technology

[51] Open Mindset

[52] Individual experiences and collective evidence

The preparedness paradox

For those keeping score at home, I also wrote two posts about the paradox of prevention, one honoring the 25th anniversary of the y2K bug and the other exploring the complexity and disconcerting arrogance of preventative medicine. These connected more to my December 2024 writing than my 2025 writing. However, I expect I’ll return to this theme again in 2026.

[53] In the year 2000

[54] Millions Now Living Will Never Die

Dec 29

Been following your course live blog with great interest, thank you so much for sharing! So much of what you share relates to my background as a recovering academic economist as a remedy for the orthodox approaches to economic modelling. MLs orientation towards learning and action resonates. Wanted to put Alex Imas’ Substack on your radar. https://substack.com/@aleximas/note/p-182334603?r=qumb&utm_medium=ios&utm_source=notes-share-action. He shares an application of transformers to learning dynamics embedded in an economic model of the macroeconomy. Integrating ML into economic applications feels like a real step towards a new and more productive paradigm.

3 replies by Ben Recht and others

Greg Stoddard

Jan 5

Happy new year! I first off wanted to say that I really enjoy your substack - you have a very unique perspective on ML/AI that I always find thought provoking.

I had one question that I wanted to get your thoughts on (as I have been thinking and wrestling with these kind of questions).

Let's take the question of whether AI-enabled technologies making hiring more or less biased. There's an entire academic literature and popular literature that will immediately conclude that AI must make things more biased. But there's also an entire academic and popular literature that shows that humans are quite biased as well. So it seems impossible to answer the question of whether AI resume screening tools reduce or exacerbate bias from just first principles. (I'm using hiring as an example here but I think the basic structure would be true for a lot of AI-in-society questions)

If its not possible to answer from first principles, then it seems to me that you really need some sort of an RCT to answer the question. But it seems from your posts on statistical fatalism that you're skeptical that RCTs can really answer anything since the conditions in an RCT do not reflect the real-world (i.e. how the resume screening tool is used in the RCT may not be representative of how companies actually use the tools, the Lucas critique, small effects need huge samples, etc ).

From your perspective, is there a way of convincing (or at least semi-convincing) way of addressing questions like "does introducing algorithmic tool X to real-world setting Y make things better or worse"?

3 more comments...

arg min

Discussion about this post

Ready for more?