W(leaf,i) r+ γ V(s0) s env.RESET() solution [ ].List of actions N(leaf,i) 1 for 1 Lp do Q(leaf,i) W(leaf,i) actions PLANNER(s) function UPDATE(path, leaf)

Apr-24-2026, 11:50:34 GMT–Neural Information Processing Systems

A.1 MCTS-kSubS algorithm In Algorithm 4 we present a general MCTS solver based on AlphaZero. Solver repeatedly queries the planner for a list of actions and executes them one by one. Baseline planner returns only a single action at a time, whereas MCTS-kSubS gives around kactions - to reach the desired subgoal (number of actions depends on a subgoal distance, which not always equals k in practice). MCTS-kSubS operates on a high-level subgoal graph: nodes are subgoals proposed by the generator (see Algorithm 3) and edges - lists of actions informing how to move from one subgoal to another (computed by the low-level conditional policy in Algorithm 2). The graph structure is represented by treevariable. For every subgoal, it keeps up to C3 best nearby subgoals (according to generator scores) along with a mentioned list of actions and sum of rewards to obtain while moving from the parent to the child subgoal. Most of MCTS implementation is shared between MCTS-kSubS and AlphaZero baseline, as we can treat the behavioral-cloning policy as a subgoal generator with k = 1. MCTS-kSubS and the baseline are encapsulated in GEN_CHILDREN function (Algorithms 5 and 6).

artificial intelligence, machine learning, subgoal, (17 more...)

Neural Information Processing Systems

Apr-24-2026, 11:50:34 GMT

Conferences PDF

Add feedback

Genre:
- Workflow (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.46)
  - Representation & Reasoning > Search (0.30)

Duplicate Docs Excel Report

Title
forainactionsdo s0,r env.STEP(a) solution.APPEND(a) s s0 ifsolution.LENGTH()>Lathen returnNone ifenv.SOLVED()then returnsolution returnNone functionPLANNER(state)

Similar Docs Excel Report more

Title	Similarity	Source
None found