8 Comments
Dec 2, 2023Liked by Ben Recht

Hi Ben, I've enjoyed your blog posts a lot! As you said, these last few blog posts have been controversial indeed, but I believe the RL (and especially game-theoretic RL) community benefits greatly from these. You provide a healthy dose of skepticism and reflection; we should never stop asking whether the work we are doing can truly make an impact. As someone from this community (though obviously I don't speak for anyone else), I wanted to offer some responses as well, and ask your opinions on them. Perhaps at this point I should probably just write a separate article myself given the length but here we are.

The summary is that I believe we can learn much more from games and game theory, but the existing work is in an exceedingly narrow subset of settings (finding Nash equilibria in 2-player zero-sum games) and the community has overemphasized this existing narrow work because it's been hard. Recently, attention has turned more towards 3-player+, general-sum settings that are not purely competitive, which does require understanding and responding to non-rational agents in a manner that's applicable to complex, messy, poorly defined real life situations, in a way that existing narrow game-theoretic RL work is not. We’ve made promising initial progress here and I argue this could be the very key to addressing previous deficiencies in the applicability of game theory.

=================

You argue that computer gameplay has yielded little insight for building superintelligence or more broadly applicable systems, and thus the main interesting aspect is what computers taught us about individual games. I'd dispute this point. We crawl before we can walk and run, and despite huge advances in game-theoretic RL, we are yet only recently beginning to walk instead of crawl. Your critiques on the gap between existing utility versus claimed benefits of classical game theory, RL, and together, game-theoretic RL, are legitimate. The research community and the media have greatly overstated how broadly useful these approaches are. We're claiming to be Olympic sprinters when we've just gotten to our feet. We can't outrun a toddler, not to mention Usain Bolt.

It's because of these overstated claims by the community that I believe you say games are fixed, have easily understood goals, and don't require understanding and predicting other agents. Concretely, what I mean is that all the games you discuss (and what most people have been touting progress on within GT-RL) are two-player zero-sum games, or otherwise behave similarly, e.g. poker or Dota. In particular, Nash equilibria have been the solution concept that everyone's been chasing, and the following statement is true in exactly these settings: "When we build a policy for playing the game, we optimize against the best possible adversary. If we can find a policy that always wins against the perfect adversary, we can’t lose to anyone else."

Restricting our focus to games where Nash is a fine solution, is crawling. It's been a great challenge and a great success to find equilibria in high-dimensional, imperfect information games like Stratego or Starcraft, but we haven't gotten anywhere just yet by doing these. Existing solutions in these settings suffer from exactly the same issues that you talk about -- we don't have to deal with changing settings and rules (e.g. patches) since we used fixed game versions, and we don't need to model other agents, who indeed often act neither rationally nor predictably. But a classical issue in non-2 player zero sum games is that, once other agents start deviating from "rational" behavior (Nash), playing Nash is no longer a viable solution, and the performance is simply not good. To succeed in games, past the narrow subset that we've been largely talking about in the last 10 years and even the last half century, we do need to predict and respond differently to non-rational agents, just like we do in real life.

But the community is aware of this, and 2-player zero-sum games have never been the end goal. That's why we've started to cast our eye to broader, more generally applicable settings, trying to walk and hoping one day that we can run. In Diplomacy, we need to know how to integrate natural language with actions, and negotiate complex alliances and agreements. We need to know how to talk in a non-robotic manner to not piss off humans, and to be cordial and friendly so that these exceedingly non-rational agents will take actions favorable to us -- something that speech style should have no bearing on if assuming rationality. CICERO has made great strides in this direction, and yet it's arguably just the first major initiative that industry research groups have made, having been published a year ago.

The other major game outside of "Nash-able" settings is Hanabi, a purely cooperative card game which involves signaling information about other players' hands to them, since each player cannot see their own hand. The signals are restricted, and people develop conventions for what signals mean, alongside their own common sense. How do you play with people who don't know your conventions, use different conventions, or think your conventions are stupid? We've been iterating on how to solve this problem for a few years now, with promising progress. I believe advances here would help us in exactly these messy, irrational real-world settings that you say game-theoretic approaches cannot.

To address the last point on having easily understood and unchanging goals in games, I believe there's no reason this has to be the case. Games can change just like real life, and they do (patches), we've just removed this complexity so far, to focus on other challenges. Even if we were just concerned about being good at playing Dota, we need to understand how to process changes in rules and environment dynamics (even if it's hidden from us) by transferring knowledge or doing meta-learning. It's an important problem and well within the scope of what we can learn from studying computational gameplay. Similarly, open-world games like Minecraft have complex and poorly-defined goals, and I don't think it's a stretch to say that we can do well with trying to formally characterize and handle these goals game-theoretically, rather than shying away from them.

With all of that said, I don't think this runs counter to your point on being careful about what we can and cannot do (yet) with our mathematics. In GTRL, we've been much too aggressive with our claims, and deluded other people and perhaps ourselves. I'd advocate for better communication within the community, and to the media and public, reflecting more on the broader plan for impact, alongside chasing our short-term capability advancements.

Reading this blog has been fascinating food for thought. Thanks for the work you do, Ben.

==============

As an unimportant aside for our gamers (lol), I'd argue the characterization of post-solver poker could be more fair. Memorizing tables and folding frequently applies to the the first street (pre-flop), but the other three streets are, at best, exceedingly difficult to memorize, and still played on principles and intuition, just as it is in every other non-trivial game (if anyone is curious, see 2 Card Confidence on youtube for a principled, understanding-based approach to theoretical play, rather than a memorization-based approach). Furthermore, exploitative/non-Nash play in poker is still alive and well even in high-stakes and professional games, even from players like Linus Loeliger who are widely regarded to be the most "GTO bot"-like. The degenerate gamblers in poker who don't care for the game's theory represent the same population of chess players who play ultrabullet only (15 seconds each for the full game).

Expand full comment
author

Thanks for the thoughtful and detailed comment. You make many excellent points here. I do think that part of what makes people interesting in games of strategy is their ability to manipulate the fine line between rules and play. But I’m still not convinced that most of our existence can be reduced to any sort of formulaic games of strategy. This doesn’t mean that people shouldn’t do more research here! Games are an important part of culture, and understanding their intricacies enriches our understanding of society.

I do think the overclaiming is what makes me bristle in RL and AI more generally. People have been making audacious claims about science fiction becoming reality since the 1950s, and I find it more interesting to think about why this narrative persists than to argue about whether it’s true. Why is it that AI is the field so obsessed with overclaiming and overhyping their results, no matter how impressive the results are?

One clarification: I did not mean to suggest that it was easy to play GTO poker. I agree that on the rare occasion when you don’t immediately fold, the subgame becomes intricate and quite difficult. I know people who have been committed to it, and it requires tons of skills and practice. It’s just that for the vast majority of people, GTO poker isn’t fun. This is worth thinking about. Poker and Chess are very different games.

Expand full comment

Thanks for the reply! It's an honor :)

With regards to framing real life as games of strategy, I agree that it's hard and may turn out to be ultimately unrealistic. We're shooting for the stars by hoping to leverage game-theoretic RL for it. At the minimum, the present formalism and approaches are very poorly applicable, but my personal hope is that in the next few decades we can gain some true insight into these practical, irrational, messy systems by being more flexible and experienced with this game-theoretic RL lens.

For why people overhype, I wonder too. I speculate it's a mix of needing to have some over-optimism to motivate one's own research work, and some kind of prestige/financial incentive based on getting the public to "buy in" to your research work to get more funding/prestige, maybe like VC pitches. When there's no one outside the (optimism or prestige/financial) incentive structure who can or will rein in this hype, then it's "to the moon", lmao. Part of why I think having critical perspectives such as this blog post are valuable.

And on poker, fair that perhaps many don't like GTO poker much -- comparing poker degenerate gamblers to ultrabullet chess players might be a stretch. The circle I play with plays for $2 buyins and is always chatting about GTO principles or optimal exploitative play, so I'll admit I'm a bit biased. Bunch of CS/stats/math nerds, lol...

Expand full comment
Dec 1, 2023Liked by Ben Recht

when 281C is open to enroll😉

Expand full comment
author

soon.

Expand full comment
Dec 1, 2023Liked by Ben Recht

👏👏👏👏👏👏👏👏👏

Expand full comment

A while ago I listened to an Ezra Klein interview with C. Thi Nguyen, a philosopher who wrote the book "Games, Agency As Art" which I've been meaning to read. Nguyen made some points that got me thinking about what it means to use games as a testbed for algorithms, related to the points Colin raised below.

> So when you play chess, you get really sucked into this kind of agency where you are thinking ahead and calculating linearly. When you play diplomacy, you get sucked into this agency where you’re constantly thinking about how you can lie to people and misrepresent yourself. And when you play rock climbing, you get sucked into an agency where all your powers are about balance and fine precision and motion.

> The real promise of games, if you take them seriously, is that by playing a ton of them, you can traverse all the different possibilities of agency.

> The biggest danger that I’m worried about for games is if you spend your life playing games, you’ll expect that value systems will be crisp, clear, well-defined, and quantified. [...] I’m worried about getting stuck in the world of maximizing your clicks or Wall Street finance just because you have an expectation that what it is to act in the world is to act for clear externally well-defined points.

https://www.nytimes.com/2022/02/25/podcasts/transcript-ezra-klein-interviews-c-thi-nguyen.html

Expand full comment

I've been thinking about this more since the discussion on this blog post, and I've become more convinced of the idea that game-theoretic RL should aim to tackle settings with uncertain rules/environment dynamics, and settings with complex, unclear goals. As you quote, it's a problem if we take this current regime of well-defined rules and well-defined, simple objectives to broader real life settings.

However, I would disagree that the promise of games is that, if you play enough different ones, you'll approximately cover the space and know how to act. Rather, I think the value proposition is studying increasingly broad categories of games (really, multi-agent settings) and developing methods and perspectives which apply broadly (rather than hoping to leverage knowledge from a specific similar game).

Perhaps naively optimistic, I think that if the field allocates more attention to broader settings, we may be able to make reasonable progress on them. Maybe it's a fool's errand to try to characterize these complexities rigorously through some kind of expanded game-theoretic approach, and we should look for alternative approaches instead. But I think it's worth trying. Taking this odd, rigorous perspective might give good insights as a byproduct.

I saw this xkcd comic a number of years ago: https://xkcd.com/1002/ and at the time, thought it was funny but mostly just meant as a joke. “Isn't it silly to call 7 minutes in heaven and Calvinball games in the same category as chess or Starcraft?” But increasingly, as we build up more knowledge in settings closer by (Diplomacy/Hanabi/arbitrary-objective Minecraft/etc.), I do think the game-theoretic RL community should take a proper stab at settings like these, funny as it may seem.

Expand full comment