Reinforcement Renaissance

Communications of the ACM

Based in San Francisco, Marina Krakovsky is the author of The Middleman Economy: How Brokers, Agents, Dealers, and Everyday Matchmakers Create Value and Profit (Palgrave Macmillan, 2015). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481. The Digital Library is published by the Association for Computing Machinery.

Killer Robots? Lost Jobs?


The recent win of AlphaGo over Lee Sedol--one of the world's highest ranked Go players--has resurfaced concerns about artificial intelligence. We have heard about A.I. stealing jobs, killer robots, algorithms that help diagnose and cure cancer, competent self-driving cars, perfect poker players, and more. It seems that for every mention of A.I. as humanity's top existential risk, there is a mention of its power to solve humanity's biggest challenges. Demis Hassabis--founder of Google DeepMind, the company behind AlphaGo--views A.I. as "potentially a meta-solution to any problem," and Eric Horvitz--director of research at Microsoft's Redmond, Washington, lab--claims that "A.I. will be incredibly empowering to humanity." By contrast, Bill Gates has called A.I. "a huge challenge" and something to "worry about," and Stephen Hawking has warned about A.I. ending humanity.

AlphaGo, Deep Learning, and the Future of the Human Microscopist


In March of last year, Google's (Menlo Park, California) artificial intelligence (AI) computer program AlphaGo beat the best Go player in the world, 18-time champion Lee Se-dol, in a tournament, winning 4 of 5 games.1 At first glance this news would seem of little interest to a pathologist, or to anyone else for that matter. After all, many will remember that IBM's (Armonk, New York) computer program Deep Blue beat Garry Kasparov--at the time the greatest chess player in the world--and that was 19 years ago. The rules of the several-thousand-year-old game of Go are extremely simple. The board consists of 19 horizontal and 19 vertical black lines.

Scalable Learning in Stochastic Games

AAAI Conferences

Michael Bowling and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh PA, 15213-3891 Abstract Stochastic games are a general model of interaction between multiple agents. They have recently been the focus of a great deal of research in reinforcement learning as they are both descriptive and have a well-defined Nash equilibrium solution. Most of this recent work, although very general, has only been applied to small games with at most hundreds of states. On the other hand, there are landmark results of learning being successfully applied to specific large and complex games such as Checkers and Backgammon. In this paper we describe a scalable learning algorithm for stochastic games, that combines three separate ideas from reinforcement learning into a single algorithm. These ideas are tile coding for generalization, policy gradient ascent as the basic learning method, and our previous work on the WoLF ("Win or Learn Fast") variable learning rate to encourage convergence. We apply this algorithm to the intractably sized game-theoretic card game Goofspiel, showing preliminary results of learning in self-play. We demonstrate that policy gradient ascent can learn even in this highly non-stationary problem with simultaneous learning. We also show that the WoLF principle continues to have a converging effect even in large problems with approximation and generalization. Introduction We are interested in the problem of learning in multiagent environments. One of the main challenges with these environments is that other agents in the environment may be learning and adapting as well.

AlphaGo takes AI to a new level -


At the end of the fifth and final match, Lee Sedol sat back quietly in his chair in a conference room at the Four Seasons Hotel in Seoul as the collected computer scientists celebrated around him. Lee, second only to fellow South Korean Lee Chang-Ho in international titles in the ancient Chinese board game of Go, put up a valiant fight against the machine, AlphaGo, created by Google's DeepMind division. AlphaGo had erred early on, but recovered to overpower the human and win the series four to one. Board games have been used since the early days of artificial intelligence research as ways to measure progress -- IBM's Deep Blue famously beat world chess champion Garry Kasparov in New York in 1997 -- and AlphaGo's victory marks another significant milestone in the advancement of the technology. Go presents a far greater challenge to AI than chess.