Adaptive Q -Aid for Conditional Supervised Learning in Offline Reinforcement Learning

May-27-2025, 10:42:41 GMT–Neural Information Processing Systems

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce Q -Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of Q -functions. By analyzing Q -function over-generalization, which impairs stable stitching, QCS adaptively integrates Q -aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the highest trajectory returns across diverse offline RL benchmarks. QCS represents a breakthrough in offline RL, pushing the limits of what can be achieved and fostering further innovations.

conditional supervised learning, learning, offline reinforcement learning, (2 more...)

Neural Information Processing Systems

May-27-2025, 10:42:41 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.95)
  - Reinforcement Learning (0.67)