The Update Equivalence Framework for Decision-Time Planning
Sokota, Samuel, Farina, Gabriele, Wu, David J., Hu, Hengyuan, Wang, Kevin A., Kolter, J. Zico, Brown, Noam
–arXiv.org Artificial Intelligence
The process of revising (or constructing) a policy immediately prior to execution -- known as decision-time planning -- is key to achieving superhuman performance in perfect-information settings like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information settings, leading to superhuman performance in poker. However, these methods requires considering subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on subgames but rather on the notion of update equivalence. In this framework, decision-time planning algorithms simulate updates of synchronous learning algorithms. This framework enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision-time planning in settings with large amounts of non-public information. In experiments, members of this family produce comparable or superior results compared to state-of-the-art approaches in Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.
arXiv.org Artificial Intelligence
Apr-25-2023
- Country:
- North America > United States (0.46)
- Genre:
- Overview > Innovation (0.34)
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games > Chess (0.34)
- Technology: