Reinforcement Renaissance

Communications of the ACM

Based in San Francisco, Marina Krakovsky is the author of The Middleman Economy: How Brokers, Agents, Dealers, and Everyday Matchmakers Create Value and Profit (Palgrave Macmillan, 2015). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481. The Digital Library is published by the Association for Computing Machinery.

Killer Robots? Lost Jobs?


The recent win of AlphaGo over Lee Sedol--one of the world's highest ranked Go players--has resurfaced concerns about artificial intelligence. We have heard about A.I. stealing jobs, killer robots, algorithms that help diagnose and cure cancer, competent self-driving cars, perfect poker players, and more. It seems that for every mention of A.I. as humanity's top existential risk, there is a mention of its power to solve humanity's biggest challenges. Demis Hassabis--founder of Google DeepMind, the company behind AlphaGo--views A.I. as "potentially a meta-solution to any problem," and Eric Horvitz--director of research at Microsoft's Redmond, Washington, lab--claims that "A.I. will be incredibly empowering to humanity." By contrast, Bill Gates has called A.I. "a huge challenge" and something to "worry about," and Stephen Hawking has warned about A.I. ending humanity.

Deep Learning Demystified - The New Stack


This year has been a good one for robots in the epic battle of Man vs. Machine. It's been decades since the first computer beat a chess champion, but the ancient Chinese game of Go -- which supposedly has more possible moves than there are atoms in the universe -- had always escaped the robot's grasp. At least until Google's AlphaGo took four out of five games against the reigning human world champion. Well, basically it taught itself. Google's DeepMind artificial intelligence subsidiary spent the last two years developing this database of 100,000 human-played rounds of Go which it fed into AlphaGo which then played against itself millions of times, using machine learning and neural networks to improve until it was finally the victor.

AlphaGo, Deep Learning, and the Future of the Human Microscopist


In March of last year, Google's (Menlo Park, California) artificial intelligence (AI) computer program AlphaGo beat the best Go player in the world, 18-time champion Lee Se-dol, in a tournament, winning 4 of 5 games.1 At first glance this news would seem of little interest to a pathologist, or to anyone else for that matter. After all, many will remember that IBM's (Armonk, New York) computer program Deep Blue beat Garry Kasparov--at the time the greatest chess player in the world--and that was 19 years ago. The rules of the several-thousand-year-old game of Go are extremely simple. The board consists of 19 horizontal and 19 vertical black lines.

Scalable Learning in Stochastic Games

AAAI Conferences

Michael Bowling and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh PA, 15213-3891 Abstract Stochastic games are a general model of interaction between multiple agents. They have recently been the focus of a great deal of research in reinforcement learning as they are both descriptive and have a well-defined Nash equilibrium solution. Most of this recent work, although very general, has only been applied to small games with at most hundreds of states. On the other hand, there are landmark results of learning being successfully applied to specific large and complex games such as Checkers and Backgammon. In this paper we describe a scalable learning algorithm for stochastic games, that combines three separate ideas from reinforcement learning into a single algorithm. These ideas are tile coding for generalization, policy gradient ascent as the basic learning method, and our previous work on the WoLF ("Win or Learn Fast") variable learning rate to encourage convergence. We apply this algorithm to the intractably sized game-theoretic card game Goofspiel, showing preliminary results of learning in self-play. We demonstrate that policy gradient ascent can learn even in this highly non-stationary problem with simultaneous learning. We also show that the WoLF principle continues to have a converging effect even in large problems with approximation and generalization. Introduction We are interested in the problem of learning in multiagent environments. One of the main challenges with these environments is that other agents in the environment may be learning and adapting as well.