Yes, Q-learning Helps Offline In-Context RL

Tarasov, Denis, Nikulin, Alexander, Zisman, Ilya, Klepach, Albina, Polubarov, Andrei, Lyubaykin, Nikita, Derevyagin, Alexander, Kiselev, Igor, Kurenkov, Vladislav

arXiv.org Artificial Intelligence 

In this work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. Through experiments across more than 150 datasets derived from GridWorld and MuJoCo environments, we demonstrate that optimizing RL objectives improves performance by approximately 40% on average compared to the widely established Algorithm Distillation (AD) baseline across various dataset coverages, structures, expertise levels, and environmental complexities. Our results also reveal that offline RL-based methods outperform online approaches, which are not specifically designed for offline scenarios. These findings underscore the importance of aligning the learning objectives with RL's reward-maximization goal and demonstrate that offline RL is a promising direction for application in ICRL settings.