Unreal Is Here
Mapping the territory of simulation and its many purposes.
Though I’ve been prefacing my lecture blog posts with italicized disclaimers, I want to single this lecture blog out as being targeted a bit more broadly. Because, in a weird confluence, the topic of this week’s lecture coincides with the topic of an op-ed by Leif Weatherby and me that appears this morning in the New York Times: forecasting and simulation.
We can’t avoid prediction and simulation in a class about feedback systems. Our theories suggest that better predictions and forecasts lead to better plans of action. Additionally, we try to make sense of complex, interconnected systems by simulating their behavior, and simulations often reveal surprising “emergent” behavior of the whole, which wasn’t evident from the modeled behavior of the parts. We also tend to think that the subcomponents of complex, interconnected systems make sense of their surroundings by predicting what other components around them will do.
I was a bit slippery in that paragraph about what the difference is between simulation and prediction. That’s because I’m still not sure how to draw a boundary between the two concepts. The most common axis is opacity: everyone thinks there is a fundamental difference between a model that is “easy to describe” from first principles and one that is purely data-driven. We call the latter “black box” to mark our disdain. The “transparent box” systems might derive from physical laws, and we write down a set of equations that dictate how each step relates to the next. The black box systems might be derived by curve fitting, where we pick a function of convenience, untethered from causal explanation, to describe how inputs have historically mapped to outputs.
I’ll talk more about the opacity slider in later posts this week, but today, I want to ask about the purpose of simulation. That axis is more interesting to me. Simulations can be used in many different ways. You might use a simulation to better understand a system itself. Simulations of mechanical systems can give you a feel for their performance limits. You can use simulations to figure out why something went wrong, deriving causal explanations from plausible mechanisms. And, of course, you can use simulations to predict the future. You can use these simulation forecasts to make a plan of action. Or, in our Draft-Kings-addled culture, you might use them to gamble.
Leif and I talked about this murky simulation landscape in the world of public opinion polling. Specifically, we wrote about the absurdity of silicon sampling. For those unfamiliar with the term, silicon sampling is when you design a social science survey experiment and give the questions to LLMs rather than people. As absurd as this sounds, people are really pushing to do this. There’s a billion-dollar startup called Aaru that is based entirely on this silly idea. And one of their fake polls slipped its way into Axios last week, without Mike Allen realizing that the “poll” he was reporting on was a computer simulation (embarrassed, Axios later edited the story to reflect the phoniness).
But why do silicon samples have so much cachet with pollsters and social scientists? As Leif and I argue in our piece, it’s because polls already rely heavily on simulation methods. Because of remarkably high nonresponse bias, pollsters lean heavily on statistical modeling to tweak their numbers to align with reality. Polls that use multilevel regression and poststratification are already inputting a lot of simulated reality to “correct” their summarization of the data they collected. The number isn’t “percentage of yesses in my sample,” it’s “what I think the percentage of yesses is in the population given my sample and my beliefs about the population.”
Since polling already relies heavily on simulation, tossing out the expensive part of the process—you know, asking actual people questions—feels like a logical conclusion. The Nate Silverization of political coverage turned polling into prediction. In the media, the goal of polls stopped being about understanding what people think and became more about predicting the outcome of elections. If all you need to do is predict, you don’t really need pristine distillations of understanding. You can take your empirical facts and use them solely to predict outcomes. And if the goal is just prediction, you don’t need to bother asking people at all. In fact, you want more reliable data than the fickle behavior of people nagged by pollsters at the end of some modern transmission line. If your goal is only prediction, you’re probably better off not talking to people at all.
But is the purpose of polling prediction? It depends on who you ask, but I’d like to think that the answer is no. At pure face value, the topline numbers of an opinion poll are a summary of a survey. They reduce a list of ones and zeros into two numbers: a mean and a variance.
Now, using a bit more social-scientific reasoning, we might interpret this summarization as a measurement of what a group of people believes. With a rigid methodology, we can consider polling to be quantified opinion. It’s a bit odd to think that you can “objectively” measure opinion in the first place, but this has been a supposition of social science research for a long time.
Unfortunately, statistics has incredibly slippery semantics that lead people to conflate summarization with measurement and measurement with prediction. Is the percentage of “people who answered yes” a summarization of the data? Is it a measured quantity about the opinion of a broader population? Is it a prediction of how people will vote in November? Yes?
I’m interested in this conflation for both political and academic reasons. Leif and I think the polling industry is harmful to the public sphere. But setting those politics aside, I think that being upfront about the purpose of simulations and forecasts helps demystify their outputs. Indeed, this week I’ll describe how purpose dictates forecasts. Prediction of the future is difficult. But if you tell me how my predictions will be evaluated, prediction of the future is trivial. I’ll explain more about why in the next post.

