Goto

Collaborating Authors

 td-gammon


On-line Policy Improvement using Monte-Carlo Search

Tesauro, Gerald, Galperin, Gregory R.

arXiv.org Artificial Intelligence

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.


Temporal difference learning and TD-Gammon

AITopics Original Links

Complex board games are a natural testing ground for machine learning and artificial intelligence. They are based on experience; they are attractive; and they do not have the safety requirements that sometimes block the use of heuristic methods. Despite recent advances, computer chess seems not to be a success of machine learning as such, because of its reliance on brute force search rather than "intelligent" approaches. This paper presents an interesting example of an opposite situation, the game-learning program TD-Gammon. TD-Gammon is a neural network that trains itself to play backgammon by playing against itself and learning from the outcome.


Success Stories of Reinforcement Learning

#artificialintelligence

In September 2018, I got the opportunity to attend the Deep Learning Indaba conference that was held in Stellenbosch University, South Africa. Deep Learning Indaba was formed with an aim to strengthen African Machine Learning as well as to increase African participation and contribution to the advances in artificial intelligence and machine learning, and address issues of diversity in these fields of science. One of the lectures that I really enjoyed was on Success Stories of Reinforcement Learning where we got introduced to reinforcement learning as well as how it was used to build some pretty awesome computer programs. This lecture was presented by David Silver. Professor David Silver Leads the reinforcement learning research group at DeepMind which is an AI company based in London that was acquired by Google in 2014.


Standing on the shoulders of giants

#artificialintelligence

When you think of AI or machine learning you may draw up images of AlphaZero or even some science fiction reference such as HAL-9000 from 2001: A Space Odyssey. However, the true forefather, who set the stage for all of this, was the great Arthur Samuel. Samuel was a computer scientist, visionary, and pioneer, who wrote the first checkers program for the IBM 701 in the early 1950s. His program, "Samuel's Checkers Program", was first shown to the general public on TV on February 24th, 1956, and the impact was so powerful that IBM stock went up 15 points overnight (a huge jump at that time). This program also helped set the stage for all the modern chess programs we have come to know so well, with features like look-ahead, an evaluation function, and a mini-max search that he would later develop into alpha-beta pruning.


Reinforcement Learning: Connections, Surprises, and Challenge

Barto, Andrew G. (University of Massachusetts Amherst)

AI Magazine

The idea of implementing reinforcement learning in a computer was one of the earliest ideas about the possibility of AI, but reinforcement learning remained on the margin of AI until relatively recently. Today we see reinforcement learning playing essential roles in some of the most impressive AI applications. This article presents observations from the author’s personal experience with reinforcement learning over the most recent 40 years of its history in AI, focusing on striking connections that emerged between largely separate disciplines and on some of the findings that surprised him along the way. These connections and surprises place reinforcement learning in a historical context, and they help explain the success it is finding in modern AI. The article concludes by discussing some of the challenges that need to be faced as reinforcement learning moves out into real world.


A 'Brief' History of Game AI Up To AlphaGo, Part 2

#artificialintelligence

This is the second part of'A Brief History of Game AI Up to AlphaGo'. Part 1 is here and part 3 is here. In this part, we shall cover just about four decades of progress, from the first victories of computers against people at Checkers and Chess all the way up to DeepBlue's victory against humanity's then-best living Chess player. By the late 1950s, the industrious engineers at IBM were far from the only ones working on AI -- excitement for the new field filled research groups in universities from the US to the Soviet Union. One such group was made up of Allen Newell and Herbert Simon (both attendants of the Dartmouth Conference) from Carnegie Mellon University, and Cliff Shaw from RAND Corporation. They collaborated on Chess AI from 1955 to 1958, culminating in "Chess Playing Programs and the Problem of Complexity"1 which both summarized existing Chess AI research and contributed new ideas that they tested with the NSS (Newell, Shaw, and Simon) Chess program. Just as Shannon noted that master players use intuition to think selectively about moves, Newell, Shaw and Simon considered heuristics to be an important aspect of human Chess-playing.


Before AlphaGo there was TD-Gammon -- Jim Fleming

#artificialintelligence

Check out the Github repo for an implementation of TD-Gammon with TensorFlow. A few weeks ago AlphaGo won a historic tournament playing the game of Go against Lee Sedol, one of the top Go players in the world. Many people have compared AlphaGo to DeepBlue, which won a series of famous chess matches against Gary Kasparov, but a different comparison may be made for the game of backgammon. Before DeepMind tackled playing Atari games or built AlphaGo there was TD-Gammon, the first algorithm to reach an expert level of play in backgammon. Gerald Tesauro published his paper in 1992 describing TD-Gammon as a neural network trained with reinforcement learning.


TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex

arXiv.org Artificial Intelligence

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(lambda) and another less radical variant, TD-directed(lambda). In particular, our chess program, ``KnightCap,'' used TDLeaf(lambda) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). It improved from a 1650 rating to a 2100 rating in just 308 games. We discuss some of the reasons for this success and the relationship between our results and Tesauro's results in backgammon.