One out of five AI researchers
To figure out the purpose of forecasting, I put on my Dan Davies hat and ask, “What do forecasts do?”
Let’s close out the week by asking ourselves what forecasts are good for. Matt Levine had a highly relevant column on Tuesday. In a discussion of what should and shouldn’t be legal in futures markets, he begins:
Kalshi offers a prediction market where you can bet on sports. No! Sorry! Wrong! It offers a prediction market where you can predict which team will win a sports game, and if you predict correctly you make money, and if you predict incorrectly you lose money. Not “bet on sports.” “Predict sports outcomes for money.” Completely different.
Completely different! Levine, in his inimitable style, discusses the fine line between gambling and investing. It’s a complicated moving target. When commodity futures were introduced, allowing farmers to hedge against the price of corn or wheat being too high, many people thought these were gambling. No product was delivered upon the closure of the contract. The US established an entire branch of the federal government, the Commodity Futures Trading Commission (CFTC), to decide which contracts were and were not gambling. Until recently, the CFTC adhered to an “economic purpose test,” under which futures contracts had to demonstrate a clear benefit to hedging or price discovery. But bureaucratic refactoring and a lot of aggressive trading have blurred the line to the point where it’s unclear whether betting on sports is the same as hedging against corn prices.
Levine’s column makes it clear that in the face of the law, we have to describe clearly what the purpose of forecasts are. They all have some purpose. A large part of what makes Defensive Forecasting particularly interesting to me is it reveals purpose is the most critical thing to identify in forecasting. Once you articulate what you want from a forecast, what the penalty is for being wrong, and what signals might help you make those forecasts, the rest is mechanical. The hard part is declaring what you want to predict and why.
It was funny to be in the midst of writing up this paper on forecasting and receiving a request from the Forecasting Research Institute to participate in their “Longitudinal Expert AI Panel.” FRI, with the generous funding of our EA friends at Open Philanthropy, offered to pay me 2000 dollars a year to fill out monthly forecasts on my opinions about the future of AI. Some of the sample questions included:
What will be the highest % accuracy achieved by an AI model on FrontierMath, by December 31, 2026?
What percentage of work hours in the U.S. will be assisted by generative AI by December 31, 2026, according to a study with a similar methodology to this Federal Reserve Bank of St. Louis study?
What percentage of rideshare trips will be provided by autonomous vehicles of at least SAE Level 4 in 2026?
I was confused. Why should I know anything about these things? And who would care about the spread in the numbers here from random professors? And why was FRI asking me these questions? What is the purpose of running this survey?
The email claimed that their goal was to “better understand the views of experts on expected changes in AI capabilities, adoption of AI tools, the economic impact of AI, and other topics, similar to the US Economic Experts Panel but for the field of AI.”
This is revealing. The EEP is infamous for being a cudgel for public opinion. Every week, the Chicago School of Business sends an economic statement to its illustrious panel of economists, requesting a response on the Likert Scale. The statements are value-laden and written to drive policy:
Long-run US fiscal sustainability will require some combination of slowing the growth of spending on Medicare, Medicaid and Social Security benefits and/or tax increases, including higher taxes on households with incomes below $400,000.
The risks of harm from use of social media services - such as harm to mental health, exploitation of children, and more - are now high enough that society would benefit from federal regulations setting safety standards and creating a process of compensation for harm by digital platforms.
Strongly Agree? Agree? Disagree? Why should we care about the responses of 80 econ professors? I’m not sure, but the results of this particular opinion poll get cited in The New York Times as a synthesis of expertise when making the case for or against some macroeconomic decision point. The EEP is used in the same way as in the opinions of 4 out of 5 dentists were used to sell Trident chewing gum.
The sample questions I received from FRI were far more frivolous, and, on the surface, they don’t seem to be asking about what AI researchers think should happen. But they are clearly after a similar aim. They want to assemble a panel of “expert” opinion to advance goals. The questions in the forecast exist because they involve a future the institute wants to construct.
Anyway, thanks but no thanks, FRI. I don’t need to be one out of five AI researchers. I’m curious what the other faculty involved in this panel think will come of this project. As you might expect, since this is a random thing in the public sphere of AI and policy, the email claimed Arvind Narayanan as one of the project leads. Maybe Arvind can blog about why he thinks this is a worthwhile exercise.
I think a more worthwhile exercise would be listening to the Reboot Podcast conversation between Jessica Dai and Saffron Huang about why superforecasters and AI accelerationists are so enamored with each other. Jess and Saffron focus on the AI2027 forecast, a bunch of science fiction gobbledy gook, that Kevin Roose wrote about in the New York Times because of the claimed “superforecasting ability” of the authors.
Saffron argues the future is constructed, not predicted. What we forecast is part of that construction. We forecast a particular set of questions for a particular audience to convince everyone to take those questions seriously and work towards actualizing the desired answers. The mathematical models of forecasting rarely consider that forecasters are part of this world and have desires about what they want the future to be. The objective of forecasts could be wish fulfilment. In that case, we can use mechanical forecasting tools to construct the future we want. However, I side with Jess and Saffron on this one: our public conversations would be more productive if we didn’t hide human agency and desire behind statistics.
I've been thinking a bit lately about how much of the cultural currency of the tech industry hinges on a carefully curated perception of inevitability. Technology is just stuff made by people; there is always going to be multiple worlds accessible from here based on what people decide to work on on one hand and use or prohibit on the other. But canting the discussion towards 'what technology wants' or 'what's going to happen' in a way that could be meaningfully and dispassionately predicted like a simple physical system really just leaves investing and buying first as the only moves on the board. 'What should people work on' is an empowering question, 'what do you guess you'll be able to buy' is not.
That last paragraph touches on something that’s always bothered me with all the talk about superforecasters and prediction markets. Rationalists talk about these things as if making them mainstream would increase our ability to predict the future with no adverse effects.
But to me it always seemed likely that the more mainstream these groups become, the more predictions themselves will influence events, and the more incentive there will be to manipulate them.