Goto

Collaborating Authors

 rebuttal-fig


T ask Reward Threshold #episodes needed by LA-MCTS to get threshold Swimmer-v1 325 126 Hopper-v1 3120 2913 HalfCheetah-v1 3430 3967 Walker2d-v1 4390 N/A(r best = 3523) Ant-v1 3580 N/A(r

Neural Information Processing Systems

Table 1: Averaged samples to reach the reward threshold on Mujoco-V1. Table. 2 in the main paper uses Mujoco-V2. We sincerely thank reviewers R1, R2, R3 for their constructive feedbacks. We redo the experiment on Mujoco-V1 in Table. 1. LA-MCTS shows This is when a plateau of regret happens. We will clarify it in the paper.