Yes, Q-learning Helps Offline In-Context RL

Tarasov, Denis, Nikulin, Alexander, Zisman, Ilya, Klepach, Albina, Polubarov, Andrei, Lyubaykin, Nikita, Derevyagin, Alexander, Kiselev, Igor, Kurenkov, Vladislav

Feb-24-2025–arXiv.org Artificial Intelligence

In this work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. Through experiments across more than 150 datasets derived from GridWorld and MuJoCo environments, we demonstrate that optimizing RL objectives improves performance by approximately 40% on average compared to the widely established Algorithm Distillation (AD) baseline across various dataset coverages, structures, expertise levels, and environmental complexities. Our results also reveal that offline RL-based methods outperform online approaches, which are not specifically designed for offline scenarios. These findings underscore the importance of aligning the learning objectives with RL's reward-maximization goal and demonstrate that offline RL is a promising direction for application in ICRL settings.

machine learning, natural language, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Feb-24-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)