# The Only Winning Move

### Game Theory is a cautionary tale about decision making under uncertainty.

We end the semester on games because they comprise the ideal application of decision making under uncertainty. As Samuel smartly noted 60 years ago, games are perfectly well-posed. The outcome we want to maximize is clear (winning). The rules of the game are fixed and don’t change. When we build a policy for playing the game, we optimize against the best possible adversary. If we can find a policy that always wins against the perfect adversary, we can’t lose to anyone else.

Computational gameplay has pushed computers forward, captivating engineers and people to build machines with better capabilities. And it has given us insights into how we play games. Playing games is part of being human, and these competitions help connect us with each other and challenge ourselves to improve. What turned out to be interesting about computer gameplay was not what it taught us about some pie-in-the-sky conception of artificial intelligence. It was what it taught us about the games themselves,

What did computers teach us about Chess? Shannon posited that you could build a Chess playing robot with efficient tree search and reasonable board evaluation. This turned out to be correct, and proving it correct was simply a matter of waiting 50 years for engineers to scale up computers.

What did computers teach us about Checkers? Checkers is so simple that an exhaustive search can and does find the optimal strategy. Playing optimally in checkers ends in a draw, just like in tic-tac-toe.

What did computers teach us about Go? Computer science researchers in the 2010s realized that a simple "code book" that just looked at the board and guessed the next move could probably beat amateur players. Even though Go looked harder than Chess because of its massive game trees, a computer didn't require impossibly deep tree search to beat people. Maybe in retrospect this is not that surprising. Perhaps Go is, in many ways, *easier* than Chess.

What did we learn about Poker? Around 2010, researchers realized you could find optimal betting strategies for small subgames of Poker by writing the game in the correct way. This led to the solving of Limit Hold 'Em and competitive play at No Limit. Professional poker players started memorizing expected value tables from poker solvers so they could play "Game Theory Optimal" strategies in big tournaments. And then everyone realized optimal poker was boring and just meant folding all the time. It turned out that people didn’t play poker because they loved optimal stochastic strategies. They played Poker because they were degenerate gamblers.

In all these cases, computers taught people how to play better. Did our pursuits into computational game theory teach us anything about building superintelligence? No. Did it give humans competitive advantages so they could go best other humans? Yes.

A second important contribution of Game Theory was providing a particular means of participatory decision-making. When there is a complex problem with competing interests, Game Theory provides a programmatic, though imperfect, solution framework. Stakeholders first have to agree that a rational solution is the best solution. Though economists and weirdo Bayesian Rationalists love to claim otherwise, we now all appreciate that the “rationality” as defined by economists is a choice, not a natural property of human decision-making. But, given a tricky problem where not everyone will be happy with the outcome, participants can agree that they will accept the decision of a rational decision-maker. They can debate and agree upon the acceptable utilities associated with each potential outcome. And then the policymaker can design a game-theory optimal solution to the agreed-upon formulation of the problem.

This sort of game theoretic design matches doctors to residency programs, allocates communication channels for telecom providers, assigns children to high schools, and finds kidney donors. These solutions do not make everyone happy, and they need to be revisited as unforeseen negative impacts are discovered after the implementation. But, in some instances, Game Theory provides acceptable outcomes to challenging decision problems. It’s not the only game in town, but it can be a valuable tool in a mediator’s toolbox.

When von Neumann and Morgenstern invented Game Theory in the 1940s, they had audacious goals for a mathematical theory of economics. That “the typical problems of economic behavior become strictly identical with the mathematical notions of suitable games of strategy.” But Game Theory, as a means of understanding humans and human economies, was a failure. It’s only in the games we play—Chess, Checkers, DOTA, Starcraft—where definite, clear rules apply. In the reality of human social interaction, there’s far more room to play. Humans are not rational and don’t make decisions based on an abstract construction of rationality. Game Theory is useless at predicting how humans will behave when complex decisions are laid out before them. And there is no fix for this. Behavioral Economic theories aimed at casting humans as “predictably irrational” have been sullied by deep and fundamental replication crises and high-profile fraud scandals.

The history of Game Theory tells a cautionary tale about believing too much in our mathematics. But this is true about all decision making under uncertainty. I know we just spent half a semester trying to prove otherwise, but decision making under uncertainty can’t generally be solved by mathematics because the future is uncertain.

There are special cases where we can make partial progress. If we can force the world to behave by rigid rules, like in aerospace or computer engineering, we can simply optimize and force the uncertainty to remain small through engineering. If we have a lot of uncertainty but a simple decision space and homogeneity in the outcomes of individuals, we can run lots of randomized trials to improve policies. If we can have a clear outcome and rules, we can solve for Nash equilibrium strategies.

But what if there isn’t such regularity in future outcomes? What happens when our actions change their meaning in context? What happens if our conception of what a good outcome is changes? What happens when those outcomes can’t be cleanly compared and quantified? What if we can’t even properly pose our problem as an optimization? At this point, perhaps we need to stop hoping there’s an answer in data, statistics, and optimization. But that doesn’t mean the situation is hopeless. It just means we need to think harder. This course has got me excited to think outside of the optimization mindset. What other concepts can we bring to make sense of the future?

## The Only Winning Move

Hi Ben, I've enjoyed your blog posts a lot! As you said, these last few blog posts have been controversial indeed, but I believe the RL (and especially game-theoretic RL) community benefits greatly from these. You provide a healthy dose of skepticism and reflection; we should never stop asking whether the work we are doing can truly make an impact. As someone from this community (though obviously I don't speak for anyone else), I wanted to offer some responses as well, and ask your opinions on them. Perhaps at this point I should probably just write a separate article myself given the length but here we are.

The summary is that I believe we can learn much more from games and game theory, but the existing work is in an exceedingly narrow subset of settings (finding Nash equilibria in 2-player zero-sum games) and the community has overemphasized this existing narrow work because it's been hard. Recently, attention has turned more towards 3-player+, general-sum settings that are not purely competitive, which does require understanding and responding to non-rational agents in a manner that's applicable to complex, messy, poorly defined real life situations, in a way that existing narrow game-theoretic RL work is not. We’ve made promising initial progress here and I argue this could be the very key to addressing previous deficiencies in the applicability of game theory.

=================

You argue that computer gameplay has yielded little insight for building superintelligence or more broadly applicable systems, and thus the main interesting aspect is what computers taught us about individual games. I'd dispute this point. We crawl before we can walk and run, and despite huge advances in game-theoretic RL, we are yet only recently beginning to walk instead of crawl. Your critiques on the gap between existing utility versus claimed benefits of classical game theory, RL, and together, game-theoretic RL, are legitimate. The research community and the media have greatly overstated how broadly useful these approaches are. We're claiming to be Olympic sprinters when we've just gotten to our feet. We can't outrun a toddler, not to mention Usain Bolt.

It's because of these overstated claims by the community that I believe you say games are fixed, have easily understood goals, and don't require understanding and predicting other agents. Concretely, what I mean is that all the games you discuss (and what most people have been touting progress on within GT-RL) are two-player zero-sum games, or otherwise behave similarly, e.g. poker or Dota. In particular, Nash equilibria have been the solution concept that everyone's been chasing, and the following statement is true in exactly these settings: "When we build a policy for playing the game, we optimize against the best possible adversary. If we can find a policy that always wins against the perfect adversary, we can’t lose to anyone else."

Restricting our focus to games where Nash is a fine solution, is crawling. It's been a great challenge and a great success to find equilibria in high-dimensional, imperfect information games like Stratego or Starcraft, but we haven't gotten anywhere just yet by doing these. Existing solutions in these settings suffer from exactly the same issues that you talk about -- we don't have to deal with changing settings and rules (e.g. patches) since we used fixed game versions, and we don't need to model other agents, who indeed often act neither rationally nor predictably. But a classical issue in non-2 player zero sum games is that, once other agents start deviating from "rational" behavior (Nash), playing Nash is no longer a viable solution, and the performance is simply not good. To succeed in games, past the narrow subset that we've been largely talking about in the last 10 years and even the last half century, we do need to predict and respond differently to non-rational agents, just like we do in real life.

But the community is aware of this, and 2-player zero-sum games have never been the end goal. That's why we've started to cast our eye to broader, more generally applicable settings, trying to walk and hoping one day that we can run. In Diplomacy, we need to know how to integrate natural language with actions, and negotiate complex alliances and agreements. We need to know how to talk in a non-robotic manner to not piss off humans, and to be cordial and friendly so that these exceedingly non-rational agents will take actions favorable to us -- something that speech style should have no bearing on if assuming rationality. CICERO has made great strides in this direction, and yet it's arguably just the first major initiative that industry research groups have made, having been published a year ago.

The other major game outside of "Nash-able" settings is Hanabi, a purely cooperative card game which involves signaling information about other players' hands to them, since each player cannot see their own hand. The signals are restricted, and people develop conventions for what signals mean, alongside their own common sense. How do you play with people who don't know your conventions, use different conventions, or think your conventions are stupid? We've been iterating on how to solve this problem for a few years now, with promising progress. I believe advances here would help us in exactly these messy, irrational real-world settings that you say game-theoretic approaches cannot.

To address the last point on having easily understood and unchanging goals in games, I believe there's no reason this has to be the case. Games can change just like real life, and they do (patches), we've just removed this complexity so far, to focus on other challenges. Even if we were just concerned about being good at playing Dota, we need to understand how to process changes in rules and environment dynamics (even if it's hidden from us) by transferring knowledge or doing meta-learning. It's an important problem and well within the scope of what we can learn from studying computational gameplay. Similarly, open-world games like Minecraft have complex and poorly-defined goals, and I don't think it's a stretch to say that we can do well with trying to formally characterize and handle these goals game-theoretically, rather than shying away from them.

With all of that said, I don't think this runs counter to your point on being careful about what we can and cannot do (yet) with our mathematics. In GTRL, we've been much too aggressive with our claims, and deluded other people and perhaps ourselves. I'd advocate for better communication within the community, and to the media and public, reflecting more on the broader plan for impact, alongside chasing our short-term capability advancements.

Reading this blog has been fascinating food for thought. Thanks for the work you do, Ben.

==============

As an unimportant aside for our gamers (lol), I'd argue the characterization of post-solver poker could be more fair. Memorizing tables and folding frequently applies to the the first street (pre-flop), but the other three streets are, at best, exceedingly difficult to memorize, and still played on principles and intuition, just as it is in every other non-trivial game (if anyone is curious, see 2 Card Confidence on youtube for a principled, understanding-based approach to theoretical play, rather than a memorization-based approach). Furthermore, exploitative/non-Nash play in poker is still alive and well even in high-stakes and professional games, even from players like Linus Loeliger who are widely regarded to be the most "GTO bot"-like. The degenerate gamblers in poker who don't care for the game's theory represent the same population of chess players who play ultrabullet only (15 seconds each for the full game).

when 281C is open to enroll😉