ImproveAgentswithoutRetraining: ParallelTree SearchwithOff-PolicyCorrection
–Neural Information Processing Systems
Here, we focus ourattention onthesecond case, which leads toscore improvement without anyre-training.
Neural Information Processing Systems
Feb-8-2026, 00:55:03 GMT