Value-Guided Decision Transformer: AUnified Reinforcement Learning Framework for Online and Offline Settings
–Neural Information Processing Systems
The Conditional Sequence Modeling (CSM) paradigm, benefiting from the transformer's powerful distribution modeling capabilities, has demonstrated considerable promise in Reinforcement Learning (RL) tasks. However, much of the work has focused on applying CSM to single online or offline settings, with the general architecture rarely explored. Additionally, existing methods primarily focus on deterministic trajectory modeling, overlooking the randomness of state transitions and the diversity of future trajectory distributions. Fortunately, value-based methods offer a viable solution for CSM, further bridging the potential gap between offline and online RL. In this paper, we propose Value-Guided Decision Transformer (VDT), which leverages value functions to perform advantage-weighting and behavior regularization on the Decision Transformer (DT), guiding the policy toward upper-bound optimal decisions during the offline training phase.
Neural Information Processing Systems
Jun-18-2026, 00:46:23 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence