Temporal Difference Learning of Position Evaluation in the Game of Go

Apr-6-2023, 19:01:06 GMT–Neural Information Processing Systems

The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spa(cid:173) tiotemporal interactions that make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via tem(cid:173) poral difference (TD) learning. Our approach is based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by self(cid:173) play alone.

cid, position evaluation, temporal difference learning

Neural Information Processing Systems

Apr-6-2023, 19:01:06 GMT

Conferences Web Page

Add feedback

Industry:
- Leisure & Entertainment > Games
  - Go (0.66)
  - Chess (0.65)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)