T ask Reward Threshold #episodes needed by LA-MCTS to get threshold Swimmer-v1 325 126 Hopper-v1 3120 2913 HalfCheetah-v1 3430 3967 Walker2d-v1 4390 N/A(r best = 3523) Ant-v1 3580 N/A(r

Aug-16-2025, 23:39:00 GMT–Neural Information Processing Systems

Table 1: Averaged samples to reach the reward threshold on Mujoco-V1. Table. 2 in the main paper uses Mujoco-V2. We sincerely thank reviewers R1, R2, R3 for their constructive feedbacks. We redo the experiment on Mujoco-V1 in Table. 1. LA-MCTS shows This is when a plateau of regret happens. We will clarify it in the paper.

halfcheetah-v1 3430 3967, la-mct, threshold swimmer-v1 325 126, (13 more...)

Neural Information Processing Systems

Aug-16-2025, 23:39:00 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.31)

Duplicate Docs Excel Report

Title
Rebuttal-Fig.2: LA-MCTSonWalker2d

Similar Docs Excel Report more

Title	Similarity	Source
None found