AdaCred: Adaptive Causal Decision Transformers with Feature Crediting
Kumawat, Hemant, Mukhopadhyay, Saibal
–arXiv.org Artificial Intelligence
Reinforcement learning (RL) can be formulated as a sequence modeling problem, where models predict future actions based on historical state-action-reward sequences. Current approaches typically require long trajectory sequences to model the environment in offline RL settings. However, these models tend to over-rely on memorizing long-term representations, which impairs their ability to effectively attribute importance to trajectories and learned representations based on task-specific relevance. In this work, we introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our model adaptively learns control policy by crediting and pruning low-importance representations, retaining only those most relevant for the downstream task. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
arXiv.org Artificial Intelligence
Dec-19-2024
- Country:
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Education (0.48)