Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Luo, Baiting, Pettet, Ava, Laszka, Aron, Dubey, Abhishek, Mukhopadhyay, Ayan

arXiv.org Artificial Intelligence 

If we were to apply MCTS directly to this abstracted space, we would encounter two main issues: inefficient utilization of our pre-built search space, with the search potentially diverging prematurely into unexplored regions, and difficulty in building sufficiently deep trees for high-quality long-term decision-making, particularly in areas of high stochasticity or uncertainty (Cou etoux et al., 2011). Therefore, we use progressive widening to extend MCTS to incrementally expand the search tree. It balances the exploration of new states with the exploitation of already visited states based on two hyperparameters: α [0, 1] and ϵ R + . Let |C (s, z) | denote the number of children for the state-action pair (s, z) . The key idea is to alternate between adding new child nodes and selecting among existing child nodes, depending on the number of times a state-action pair ( s, z) has been visited. A new state is added to the tree if |C ( s, z)| < ϵ N (s, z) α, where N (s, z) is the number of times the state-action pair has been visited. The hyperparameter α controls the propensity to select among existing children, with α = 0 leading to always selecting among existing child and α = 1 leading to vanilla MCTS behavior (always adding a new child). In this way, we could enhance our approach by efficiently utilizing the pre-built search space, prioritizing the exploration of promising macro actions while allowing for incremental expansion of the search tree. This technique enables our method to make quick decisions in an anytime manner, leveraging the cached information, and further refine the planning tree if additional time is available.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found