Generalized Weighted Path Consistency for Mastering Atari Games

Jan-19-2025, 17:13:42 GMT–Neural Information Processing Systems

Reinforcement learning with the help of neural-guided search consumes huge computational resources to achieve remarkable performance. Path consistency (PC), i.e., f values on one optimal path should be identical, was previously imposed on MCTS by PCZero to improve the learning efficiency of AlphaZero. Not only PCZero still lacks a theoretical support but also considers merely board games. In this paper, PCZero is generalized into GW-PCZero for real applications with non-zero immediate reward. A weighting mechanism is introduced to reduce the variance caused by scouting's uncertainty on the f value estimation.

computational resource, generalized weighted path consistency, mastering atari game, (2 more...)

Neural Information Processing Systems

Jan-19-2025, 17:13:42 GMT

Conferences Web Page

Add feedback

Industry:
- Leisure & Entertainment > Games > Computer Games (0.85)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.44)