Goto

Collaborating Authors

 la-mct




T ask Reward Threshold #episodes needed by LA-MCTS to get threshold Swimmer-v1 325 126 Hopper-v1 3120 2913 HalfCheetah-v1 3430 3967 Walker2d-v1 4390 N/A(r best = 3523) Ant-v1 3580 N/A(r

Neural Information Processing Systems

Table 1: Averaged samples to reach the reward threshold on Mujoco-V1. Table. 2 in the main paper uses Mujoco-V2. We sincerely thank reviewers R1, R2, R3 for their constructive feedbacks. We redo the experiment on Mujoco-V1 in Table. 1. LA-MCTS shows This is when a plateau of regret happens. We will clarify it in the paper.


Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search

Wang, Linnan, Fonseca, Rodrigo, Tian, Yuandong

arXiv.org Artificial Intelligence

High dimensional black-box optimization has broad applications but remains a challenging problem to solve. Given a set of samples $\{\vx_i, y_i\}$, building a global model (like Bayesian Optimization (BO)) suffers from the curse of dimensionality in the high-dimensional search space, while a greedy search may lead to sub-optimality. By recursively splitting the search space into regions with high/low function values, recent works like LaNAS shows good performance in Neural Architecture Search (NAS), reducing the sample complexity empirically. In this paper, we coin LA-MCTS that extends LaNAS to other domains. Unlike previous approaches, LA-MCTS learns the partition of the search space using a few samples and their function values in an online fashion. While LaNAS uses linear partition and performs uniform sampling in each region, our LA-MCTS adopts a nonlinear decision boundary and learns a local model to pick good candidates. If the nonlinear partition function and the local model fits well with ground-truth black-box function, then good partitions and candidates can be reached with much fewer samples. LA-MCTS serves as a \emph{meta-algorithm} by using existing black-box optimizers (e.g., BO, TuRBO) as its local models, achieving strong performance in general black-box optimization and reinforcement learning benchmarks, in particular for high-dimensional problems.