Super-Exponential Regret for UCT, AlphaGo and Variants

May-17-2024–arXiv.org Artificial Intelligence

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

May-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Virginia (0.14)

Genre:
- Research Report (0.40)

Industry:
- Information Technology > Software (0.62)
- Leisure & Entertainment > Games
  - Go (0.72)

Technology:
- Information Technology > Artificial Intelligence
  - Games > Go (0.62)
  - Machine Learning > Neural Networks (0.47)
  - Representation & Reasoning > Planning & Scheduling (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found