Reviews: Monte-Carlo Tree Search by Best Arm Identification
–Neural Information Processing Systems
This work uses best arm identification (BAI) techniques applies to the monte-carlo tree search problem with two-players (in a turn-based setting). The goal is to find the best action for player A to take by carefully considering all the next actions that player B and A can take in the following rounds. Access to a stochastic oracle to evaluate the values of leaves is supposed, hence the goal is to find approximately (with precision \epsilon) the best action at the root with high confidence (at least 1-\delta). Algorithms based on confidence intervals and upwards propagation (from leaf to the root) of the upper (for the MAX nodes, action by player A) and lower (for the MIN nodes, action by player B the opponent) confidence bounds are proposed. The algorithms are intuitive and well described, and well rooted in the fixed confidence BAI literature.
Neural Information Processing Systems
Oct-8-2024, 05:42:45 GMT
- Technology: