Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling
–Neural Information Processing Systems
Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with selfimprovement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT).
Neural Information Processing Systems
Mar-23-2025, 07:35:25 GMT
- Country:
- Asia (0.28)
- North America > United States
- Pennsylvania (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.46)
- Information Technology (0.46)