StARformer: Transformer with State-Action-Reward Representations
Shang, Jinghuan, Ryoo, Michael S.
–arXiv.org Artificial Intelligence
Reinforcement Learning (RL) can be considered as a sequence modeling task, i.e., given a sequence of past state-action-reward experiences, a model autoregressively predicts a sequence of future actions. Recently, Transformers have been successfully adopted to model this problem. In this work, we propose State-Action-Reward Transformer (StARformer), which explicitly models local causal relations to help improve action prediction in long sequences. A sequence of such local representations combined with state representations, is then used to make action predictions over a long time span. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on Atari (image) and Gym (state vector) benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs compared to the baseline. Our code is available at https://github.com/ Reinforcement Learning (RL) naturally comes with sequential data: an agent observes a state from the environment, takes an action, observes the next state and receives a reward from the environment.
arXiv.org Artificial Intelligence
Oct-12-2021
- Country:
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Genre:
- Research Report (0.64)
- Workflow (0.46)
- Technology: