Pieter Abbeel Team's Decision Transformer Abstracts RL as Sequence Modelling
Their proposed Decision Transformer outputs optimal actions by leveraging a causally masked transformer and can generate future actions with desired returns. Moreover, despite Decision Transformer's relative simplicity, the proposed framework matches or outperforms the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks. Transformer architectures are able to efficiently model sequential data, and their self-attention mechanism allows the layer to assign "credit" by implicitly forming state-return associations via maximizing the dot product of the query and key vectors. Transformers can thus function effectively in the presence of sparse or distracting rewards. Previous studies have also shown that transformers can model a wide distribution of behaviours, enabling better generalization and transfer abilities.
Jun-9-2021, 21:50:13 GMT