CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum

Dec-24-2025, 03:42:17 GMT–Neural Information Processing Systems

Goal-conditioned reinforcement learning (RL) usually suffers from sparse reward and inefficient exploration in long-horizon tasks. Planning can find the shortest path to a distant goal that provides dense reward/guidance but is inaccurate without a precise environment model. We show that RL and planning can collaboratively learn from each other to overcome their own drawbacks. In ''CO-PILOT'', a learnable path-planner and an RL agent produce dense feedback to train each other on a curriculum of tree-structured sub-tasks. Firstly, the planner recursively decomposes a long-horizon task to a tree of sub-tasks in a top-down manner, whose layers construct coarse-to-fine sub-task sequences as plans to complete the original task.

co-pilot, collaborative planning and reinforcement learning, curriculum, (8 more...)

Neural Information Processing Systems

Dec-24-2025, 03:42:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)