A Strange Game

What we learn by teaching computers to play games.

Nov 29, 2023

We end the course at the beginning. In 1959, Arthur Samuel published the report “Some Studies in Machine Learning Using the Game of Checkers” in the IBM Journal of Research and Development. As far as I know, this was the first time anyone used the term “machine learning” in a publication.

Samuel had been at IBM for a decade at this point, and he was a core engineer on the IBM 701, IBM’s first scientific computer. But Samuel had been working on solving checkers for even longer than that. In 1948, while on the faculty of the University of Illinois Urbana-Champaign, Samuel faced the problem all academics face eventually: he was running out of grant money.

Samuel stumbled upon a brilliant marketing pitch. He saw a talk by Claude Shannon about his work on chess. Samuel left the talk inspired, mistakenly under the impression that Shannon had already built a chess-playing computer. Samuel figured then that Checkers would be a piece of cake. And it would make a perfect demo for him to pitch his lab. He could build a prototype player in less than a year and then enter it in the Checker World Championship in Kankakee, Illinois, 75 miles up the road from the university. Like most AI pioneers, he dramatically underestimated the difficulty of his project.

The Checkers computer, of course, didn’t turn out as planned. He sketched out a prototype algorithm and quickly realized he’d need a much larger computer than he envisioned and wouldn’t have the budget to build it. Disillusioned with his progress and by the funding pressures of academia, Samuel left Illinois for a research scientist position at IBM the following year. (Ah, the more things change, the more they stay the same). At IBM, he would work with the team designing the IBM 701, IBM’s first scientific supercomputer. Samuel committed himself to ensuring the 701 design could assuredly master Checkers.

At IBM, Samuel discovered an appealing aspect of computer games. They were ideal programs to test whether computers had appropriately expansive capabilities. His logic designs for the 701 imagined what would be needed for his hypothetical Checkers program. In his mind, games could “test the proposed instruction set for its completeness and its effectiveness as a tool for expressing the operations that one would want the computer to perform.”

As the 701 came into production, Samuel scheduled time to implement his checker player. At IBM Samuel had worked on his checkers program for a decade, slowly experimenting at odd hours when there were spare cycles on a mainframe. Though the computer played competent matches as early as 1956, he had to wait until 1959 to publish his report. IBM had apparently feared publicizing his work because it showed how humans were losing out to machines. Unlike today, this was considered bad PR for big tech in the 1950s.

Samuel's basic approach mimicked Shannon’s strategy for chess: the agent would build a game tree of all possible moves for a certain number of rounds. For the leaves of this tree, Samuel would assign approximate values for winning based on features of the board’s configuration. He would then backtrack from the leaves to find the best move. The main departure from Shannon was implementing “machine learning” to estimate the values at the leaf nodes.

Samuel’s learning algorithm was visionary. And it shows how slippery technical jargon can be. There’s no pattern recognition or classification in this paper. Instead, Samuel’s Checkers agent was running what we’d now call “self-play” in reinforcement learning.

Samuel’s learning program had two “players” he called Alpha and Beta. These two programs would play against each other. Beta would use the current best strategy available. Alpha, on the other hand, would learn a new strategy over the game. If Alpha won the game, Alpha’s strategy was given to Beta. Alpha would continue to learn against the updated Beta in the subsequent games. If Beta won a game, Alpha was given a strike. After Alpha received three strikes, Samuel would reset parts of Alpha’s program so that it would hopefully stop following such counterproductive directions. Samuel would hone his program for a decade at IBM, using late-night hours when technicians were at home to have Alpha and Beta play each other.

How did Alpha learn? It used what we now call temporal differencing (though this term would not be coined until the 1980s). At any point in the game, Alpha could use its estimates of the values of moves to estimate the values of future moves. It would then take a move, and Beta would respond. This would bring the board to some new position, and Alpha could then again estimate the value of the next move. But because Alpha would now be looking an additional move ahead, its estimate of the value of its current position would differ from its estimate from the previous position. The difference of these estimates is called a “temporal difference.” And this temporal difference could be used to update the function that estimated the value of moves. Updating the values every move, Alpha would end up with a very different function at the end of a game than from the beginning, and if the game was won, we might expect this new function to give a better strategy than the old one.

Samuel’s program turned out to be remarkably competent. It defeated an expert player in 1962. However, in 1966, when playing against master checkers players, Samuel’s program lost all of its matches. Still, it was better at checkers than any computer was at chess in 1966. Though Shannon’s chess algorithm would eventually beat humans, it would need 30 years of Moore’s law to be fast enough.

We should reflect upon why the first application of machine learning system was board games. Samuel considered gameplay an ideal playground for investigating how to build computers that learned. In games, there was a clear and definite goal (i.e., winning the game). The rules of games were definite, known, and never changed. Popular games were accessible by everyone, so the public could appreciate the difficulty of the task be impressed if the computer was a good player. It would be unambiguous if a computer program “worked.” Samuel wasn’t worried about whether the developed algorithms could be applied to learn outside the rigid and clear rules of Checkers. Checkers was just a tiny step on the way.

By now, we’ve mastered building computer programs to play games. What have we learned in the process? And what are these techniques good for outside of the game room? In this last lecture, I want to dive into the history of game theory and computational gameplay. And I want to explore what precisely it is that humans, machines, and society learned about each other.

arg min

Discussion about this post