While this semester brought a lot of personal clarity on supervised learning and prediction, I find myself more confused than ever about our frameworks for decision making under uncertainty. Let me give a shot at what I think should be the core takeaways, but then let me tease you with my confusion so we can look forward to some ill-posed research questions.
In this class, we took a narrow view of decision making under uncertainty, casting everything as stochastic optimization. We assumed a probabilistic model of future events and tried to find policies that maximized expected outcomes in this model. It was never metaphysically clear what those expected values meant, but we mostly tossed philosophy aside and investigated what it would take to find optimal policies.
What the policies looked like was deeply problem dependent. One of my favorite blogs this semester was the taxonomy of these optimization problems. We can stack all sorts of complexity into probabilistic models, and the sophistication of optimization tools required grows exponentially with this complexity. In the simplest case of static binary policies, we reduced decision making to a simple betting rule: when the odds of a good outcome are high enough, you take action. When we model the world as a dynamical system with nonlinear state dynamics and noisy measurements, the problem is effectively intractable. The taxonomy helped us sketch out the paths through the complexity landscape, from randomized trials to dynamic programming and optimal control. These all play a role, provided we believe the model on which they’re operating is useful for gauging future outcomes.
But do we believe these probabilistic models? I didn’t fully grapple with this in class, but I still don’t know what we mean by probability in these models. For example, in the introduction to Birge and Louveaux’s Introduction to Stochastic Programming, they describe a planning problem for a farmer who believes the odds that crop yield will be well above average is ⅓ and well below average is ⅓. What do these probabilities mean exactly? The book never says, but assuming this model of farming is true, it derives a strategy to maximize expected profit.
I’m still perplexed by what these probabilistic models mean, and I worry about building a bunch of scaffolding to support models of the future that depend on the probability of singular events. The idea that we model the future as random is part of the problem. I mentioned this at the beginning of the semester, and I hoped to find a resolution through teaching the class. I didn’t.
I was hopeful because I had come to a satisfactory resolution in prediction. I used to teach machine learning by assuming probability: I assumed data was sampled from a certain probabilistic model, and framed the goal as making predictions about the next sample from the same model. This is how we set everything up in Patterns, Predictions, and Actions, but Moritz and I were never comfortable with this framing.
But it turns out you can derive most of the useful machine learning theory without ever assuming these random models. As I described in earlier blogs, you can assume that you split a data set into train and test using probabilistic sampling (intentional randomness). You can argue about out-of-sample behavior using deterministic regret arguments.
I don’t think “regret” resolves my issues with decision making under uncertainty. Regret is never what we want to optimize when we are making decisions. We care about every single decision, not an average over the past. And how we tie the past to the future is more critical when we actually have to make decisions with actual impact.
Therein lies the rub. When are our models true? How can we know? I came across a great quote by M. S. Bartlett from his remarks delivered to the Manchester Statistical Society in 1951.
"Insofar as things, persons, are unique or ill-defined, statistics are meaningless ... ; in so far as things are similar and definite ... they can be counted and new statistical facts are born ... Our arithmetic is useless unless we are counting the right things."
But shoot, what things are right to count? How do we know the boundary between the repeatable and predictable and the singular and unverifiable? I don’t think we have found a good answer in the 72 years since Bartlett. And maybe we never will.
For now, I’m OK with remaining confused. I’ve been reading plenty of scholarly work and no one is any less confused. I’m happy to be part of the club! Once we accept an area is a mess, we can try to make some progress. I’m going to teach a small seminar class in the spring on the foundations of these decision making methods, and maybe next year I’ll attempt a more technical class. Maybe this will all lead to some new useful frameworks.
I can’t predict if we’ll be successful. I’m totally fine with that uncertainty.
I also had lots of confusion around this topic. I then stumbled upon Jaynes' "Probability theory" and that cleared up at least some of it. He states that probabilities merely encode our beliefs about uncertain situations. For "repeated" events these tie together with the frequencies, but for singular events they are less intuitive. Took me quite a long time to accept that probabilities are not something "existing" in the real world, but rather something "subjective" that reflects one's state of knowledge.
In the farming example above, the assumption is that it's equally likely that crop yield is well above, well below, or around average. That's just the best the farmer could come up with based on previous yields, or weather records and some knowledge about plant growth. Accepting that that's the best she can do in quantifying her uncertainty, she can then use decision theory to decide what actions to take.
But there's no "true model" in any sense as the world isn't stochastic as assumed in frequentist theory. Jaynes does a really nice job of making probability about handling uncertainty about singular events, and then shows how it can be applied.
Would be interested in what you think about Jaynes' book and the mind projection fallacy.
Did some research on Yogi Berra to understand the title -- is he used as a metaphor for confusion and paradoxical statements?