Reinforcement Renaissance

Communications of the ACM

Based in San Francisco, Marina Krakovsky is the author of The Middleman Economy: How Brokers, Agents, Dealers, and Everyday Matchmakers Create Value and Profit (Palgrave Macmillan, 2015). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481. The Digital Library is published by the Association for Computing Machinery.


Killer Robots? Lost Jobs?

Slate

The recent win of AlphaGo over Lee Sedol--one of the world's highest ranked Go players--has resurfaced concerns about artificial intelligence. We have heard about A.I. stealing jobs, killer robots, algorithms that help diagnose and cure cancer, competent self-driving cars, perfect poker players, and more. It seems that for every mention of A.I. as humanity's top existential risk, there is a mention of its power to solve humanity's biggest challenges. Demis Hassabis--founder of Google DeepMind, the company behind AlphaGo--views A.I. as "potentially a meta-solution to any problem," and Eric Horvitz--director of research at Microsoft's Redmond, Washington, lab--claims that "A.I. will be incredibly empowering to humanity." By contrast, Bill Gates has called A.I. "a huge challenge" and something to "worry about," and Stephen Hawking has warned about A.I. ending humanity.


AlphaGo, Deep Learning, and the Future of the Human Microscopist

#artificialintelligence

In March of last year, Google's (Menlo Park, California) artificial intelligence (AI) computer program AlphaGo beat the best Go player in the world, 18-time champion Lee Se-dol, in a tournament, winning 4 of 5 games.1 At first glance this news would seem of little interest to a pathologist, or to anyone else for that matter. After all, many will remember that IBM's (Armonk, New York) computer program Deep Blue beat Garry Kasparov--at the time the greatest chess player in the world--and that was 19 years ago. The rules of the several-thousand-year-old game of Go are extremely simple. The board consists of 19 horizontal and 19 vertical black lines.


Scalable Learning in Stochastic Games

AAAI Conferences

Michael Bowling and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh PA, 15213-3891 Abstract Stochastic games are a general model of interaction between multiple agents. They have recently been the focus of a great deal of research in reinforcement learning as they are both descriptive and have a well-defined Nash equilibrium solution. Most of this recent work, although very general, has only been applied to small games with at most hundreds of states. On the other hand, there are landmark results of learning being successfully applied to specific large and complex games such as Checkers and Backgammon. In this paper we describe a scalable learning algorithm for stochastic games, that combines three separate ideas from reinforcement learning into a single algorithm. These ideas are tile coding for generalization, policy gradient ascent as the basic learning method, and our previous work on the WoLF ("Win or Learn Fast") variable learning rate to encourage convergence. We apply this algorithm to the intractably sized game-theoretic card game Goofspiel, showing preliminary results of learning in self-play. We demonstrate that policy gradient ascent can learn even in this highly non-stationary problem with simultaneous learning. We also show that the WoLF principle continues to have a converging effect even in large problems with approximation and generalization. Introduction We are interested in the problem of learning in multiagent environments. One of the main challenges with these environments is that other agents in the environment may be learning and adapting as well.


Time to Fold, Humans: Poker-Playing AI Beats Pros at Texas Hold'em

#artificialintelligence

It is no mystery why poker is such a popular pastime: the dynamic card game produces drama in spades as players are locked in a complicated tango of acting and reacting that becomes increasingly tense with each escalating bet. The same elements that make poker so entertaining have also created a complex problem for artificial intelligence (AI). A study published today in Science describes an AI system called DeepStack that recently defeated professional human players in heads-up, no-limit Texas hold'em poker, an achievement that represents a leap forward in the types of problems AI systems can solve. DeepStack, developed by researchers at the University of Alberta, relies on the use of artificial neural networks that researchers trained ahead of time to develop poker intuition. During play, DeepStack uses its poker smarts to break down a complicated game into smaller, more manageable pieces that it can then work through on the fly.