Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

May-26-2025, 23:06:03 GMT–Neural Information Processing Systems

Decision Transformers have recently emerged as a new and compelling paradigm for offline Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While improvements have been made to overcome initial shortcomings, online finetuning of decision transformers has been surprisingly under-explored. The widely adopted state-of-the-art Online Decision Transformer (ODT) still struggles when pretrained with low-reward offline data. In this paper, we theoretically analyze the online-finetuning of the decision transformer, showing that the commonly used Return-To-Go (RTG) that's far from the expected return hampers the online fine-tuning process. This problem, however, is well-addressed by the value function and advantage of standard RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

May-26-2025, 23:06:03 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)