With Google’s abdication of web indexing, your humble blogger is left to index their archives themselves. Dear reader, I’m terrible at this. If only you could see the horror of my Google Docs folder. I periodically force myself to collect draft documents in the hope that this archiving will be useful for later final projects. This blog, which I consider a public repository of first drafts, is no exception. If you use the Substack app or only get argmin by email, you probably never see the doors to the archives. But since I was trying to collect my posts this week, I thought some of you might enjoy a tour.
I used to run this blog through github, and all of those old posts are still around, but now you have to append “archives” to the URL: archives.argmin.net.
If you go to argmin.net and scroll down a bit, you’ll see a link here:
I can’t figure out how to convince the Substack CMS to put this link higher on the page. Oh well. It’s also linked on the “About” page that no one visits. Regardless, the whole Jekyll blog structure remains intact at the link. If, for example, you wanted to read my survey of reinforcement learning for control applications, it’s here in all of its glory. I still get requests for these old posts and have tried to make them easy to find. Google, of course, refuses to reindex them. Oh well!!1
On the top banner, I have some other links to Substack posts I’ve collected thematically.
My course lecture blogging has its own tab, and each course has its own webpage. You can revisit, for example, my original live blogging of my graduate machine learning class in 2023.
I also have collected some posts into themes. An index of my Meehl blogging is linked there. These remind me that I need to find someone else’s class to blog through next summer. It’s fun and I highly recommend it. If you still lurk around on Twitter, you should follow Damek Davis, who is tweeting through Percy Liang’s “Language Models from Scratch” class, which has been an incredible public service.
As I mentioned at the start, I was trying to figure out how to collect the blogs I’ve written this summer into themes, and I’m not sure any theme has been coherent enough to warrant its own page. But here’s my attempt at collecting them.
A prominent theme this summer was forecasting. I wrote three posts (post 1, post 2, post 3) introducing the defensive forecasting survey I wrote with Juanky Perdomo. I followed this up with a post on the politics of forecasting. I also wrote about methodological challenges in forecasts, like when hedge funds try to convince people they have secret AI sauce but have just reinvented local weighted averaging. And though my post isn’t explicitly about forecasting, I did write about that METR study that compared forecasts to measurements of AI productivity gains. Though more about machine learning in general, last week’s post on why it’s hard to “prove” unpredictability fits into this theme.
I also found myself writing a lot of academic navel gazing. There were about ten posts about scientific culture and gatekeeping. I had a fun four-post series on the history of academic computer science. I read through Louis Fein’s pitch to the ACM to create computer science departments and how computer scientists created a mythology around the field’s origins. I also looked into how Fein was written out of the history of CS, and how his speculative vision for the future of computer science was boring and bureaucratic.
I wrote about scientific communal notions of validity and how they are more cultural style guides than divining rods of epistemological truth. I read Neil Postman’s “Social Science as Moral Theology,” which partially explains why social science feels the need to rest its rhetoric so heavily on quantitative constructs.2 I tried to thread the needle between how expertise is created and curated communally by academics but that doesn’t necessitate our current broken system of pre-publication peer review. And on a theme I want to revisit soon, I wrote about how metaanalysis of statistical significance is deeply confounded by the complex dynamic feedback mechanisms of academic expertise creation. In fact, the more I look into crises in science, the more I find that science has always been a mess. Trying to gatekeep that mess is and has been largely unproductive.
Finally, and I suppose quite thematically related, I spent a lot of time on the state of academic machine learning. I worry about its focus on bureaucratic (and statistical) gatekeeping rather than open data and models. I asked what it would take for academics to re-engage with fully open generative AI. A third post tried to tie this argument together. These posts have me doing a lot of soul searching about my research future.
Housekeeping note: I’ll be traveling for the next couple of weeks and won’t be posting while I'm away. I’ll be back the first week of August. I have a queue of ideas for what I want to write that I’ll get back to when I return, and hopefully they won’t be too outdated by then. The discourse moves fast these days! I’ll see you in August.
Rereading these, I wonder if I should write a 2025 version of this surveying the role of “RL” in language modeling. I’m sort of thinking I should. It at least deserves a blog post!
Though not from this summer, this post about Healy and Fourcade’s The Ordinal Society is closely related.
Maybe worth mentioning the Meehl series, which I thought was especially strong and interesting?
I didn't understand your first sentence. What exactly did Google do earlier that it's not doing now? What do you mean by (being left to) 'index their archives themselves'?