Discussion about this post

User's avatar
Colin Lu's avatar

Hi Ben, I've enjoyed your blog posts a lot! As you said, these last few blog posts have been controversial indeed, but I believe the RL (and especially game-theoretic RL) community benefits greatly from these. You provide a healthy dose of skepticism and reflection; we should never stop asking whether the work we are doing can truly make an impact. As someone from this community (though obviously I don't speak for anyone else), I wanted to offer some responses as well, and ask your opinions on them. Perhaps at this point I should probably just write a separate article myself given the length but here we are.

The summary is that I believe we can learn much more from games and game theory, but the existing work is in an exceedingly narrow subset of settings (finding Nash equilibria in 2-player zero-sum games) and the community has overemphasized this existing narrow work because it's been hard. Recently, attention has turned more towards 3-player+, general-sum settings that are not purely competitive, which does require understanding and responding to non-rational agents in a manner that's applicable to complex, messy, poorly defined real life situations, in a way that existing narrow game-theoretic RL work is not. We’ve made promising initial progress here and I argue this could be the very key to addressing previous deficiencies in the applicability of game theory.

=================

You argue that computer gameplay has yielded little insight for building superintelligence or more broadly applicable systems, and thus the main interesting aspect is what computers taught us about individual games. I'd dispute this point. We crawl before we can walk and run, and despite huge advances in game-theoretic RL, we are yet only recently beginning to walk instead of crawl. Your critiques on the gap between existing utility versus claimed benefits of classical game theory, RL, and together, game-theoretic RL, are legitimate. The research community and the media have greatly overstated how broadly useful these approaches are. We're claiming to be Olympic sprinters when we've just gotten to our feet. We can't outrun a toddler, not to mention Usain Bolt.

It's because of these overstated claims by the community that I believe you say games are fixed, have easily understood goals, and don't require understanding and predicting other agents. Concretely, what I mean is that all the games you discuss (and what most people have been touting progress on within GT-RL) are two-player zero-sum games, or otherwise behave similarly, e.g. poker or Dota. In particular, Nash equilibria have been the solution concept that everyone's been chasing, and the following statement is true in exactly these settings: "When we build a policy for playing the game, we optimize against the best possible adversary. If we can find a policy that always wins against the perfect adversary, we can’t lose to anyone else."

Restricting our focus to games where Nash is a fine solution, is crawling. It's been a great challenge and a great success to find equilibria in high-dimensional, imperfect information games like Stratego or Starcraft, but we haven't gotten anywhere just yet by doing these. Existing solutions in these settings suffer from exactly the same issues that you talk about -- we don't have to deal with changing settings and rules (e.g. patches) since we used fixed game versions, and we don't need to model other agents, who indeed often act neither rationally nor predictably. But a classical issue in non-2 player zero sum games is that, once other agents start deviating from "rational" behavior (Nash), playing Nash is no longer a viable solution, and the performance is simply not good. To succeed in games, past the narrow subset that we've been largely talking about in the last 10 years and even the last half century, we do need to predict and respond differently to non-rational agents, just like we do in real life.

But the community is aware of this, and 2-player zero-sum games have never been the end goal. That's why we've started to cast our eye to broader, more generally applicable settings, trying to walk and hoping one day that we can run. In Diplomacy, we need to know how to integrate natural language with actions, and negotiate complex alliances and agreements. We need to know how to talk in a non-robotic manner to not piss off humans, and to be cordial and friendly so that these exceedingly non-rational agents will take actions favorable to us -- something that speech style should have no bearing on if assuming rationality. CICERO has made great strides in this direction, and yet it's arguably just the first major initiative that industry research groups have made, having been published a year ago.

The other major game outside of "Nash-able" settings is Hanabi, a purely cooperative card game which involves signaling information about other players' hands to them, since each player cannot see their own hand. The signals are restricted, and people develop conventions for what signals mean, alongside their own common sense. How do you play with people who don't know your conventions, use different conventions, or think your conventions are stupid? We've been iterating on how to solve this problem for a few years now, with promising progress. I believe advances here would help us in exactly these messy, irrational real-world settings that you say game-theoretic approaches cannot.

To address the last point on having easily understood and unchanging goals in games, I believe there's no reason this has to be the case. Games can change just like real life, and they do (patches), we've just removed this complexity so far, to focus on other challenges. Even if we were just concerned about being good at playing Dota, we need to understand how to process changes in rules and environment dynamics (even if it's hidden from us) by transferring knowledge or doing meta-learning. It's an important problem and well within the scope of what we can learn from studying computational gameplay. Similarly, open-world games like Minecraft have complex and poorly-defined goals, and I don't think it's a stretch to say that we can do well with trying to formally characterize and handle these goals game-theoretically, rather than shying away from them.

With all of that said, I don't think this runs counter to your point on being careful about what we can and cannot do (yet) with our mathematics. In GTRL, we've been much too aggressive with our claims, and deluded other people and perhaps ourselves. I'd advocate for better communication within the community, and to the media and public, reflecting more on the broader plan for impact, alongside chasing our short-term capability advancements.

Reading this blog has been fascinating food for thought. Thanks for the work you do, Ben.

==============

As an unimportant aside for our gamers (lol), I'd argue the characterization of post-solver poker could be more fair. Memorizing tables and folding frequently applies to the the first street (pre-flop), but the other three streets are, at best, exceedingly difficult to memorize, and still played on principles and intuition, just as it is in every other non-trivial game (if anyone is curious, see 2 Card Confidence on youtube for a principled, understanding-based approach to theoretical play, rather than a memorization-based approach). Furthermore, exploitative/non-Nash play in poker is still alive and well even in high-stakes and professional games, even from players like Linus Loeliger who are widely regarded to be the most "GTO bot"-like. The degenerate gamblers in poker who don't care for the game's theory represent the same population of chess players who play ultrabullet only (15 seconds each for the full game).

Expand full comment
FourierBot's avatar

when 281C is open to enroll😉

Expand full comment
6 more comments...

No posts