Why did TD-Gammon Work?
Pollack, Jordan B., Blair, Alan D.
–Neural Information Processing Systems
Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neural network, without using back-propagation, reinforcement or temporal difference learning methods. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest that the surprising success of Tesauro's program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal difference learning in the game of Backgammon (Tesauro, 1992). After all, the dream of computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961).
Neural Information Processing Systems
Dec-31-1997
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Asia > Middle East
- Industry:
- Leisure & Entertainment > Games > Backgammon (1.00)
- Technology: