Pieter Abbeel Team's Decision Transformer Abstracts RL as Sequence Modelling

#artificialintelligence 

Their proposed Decision Transformer outputs optimal actions by leveraging a causally masked transformer and can generate future actions with desired returns. Moreover, despite Decision Transformer's relative simplicity, the proposed framework matches or outperforms the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks. Transformer architectures are able to efficiently model sequential data, and their self-attention mechanism allows the layer to assign "credit" by implicitly forming state-return associations via maximizing the dot product of the query and key vectors. Transformers can thus function effectively in the presence of sparse or distracting rewards. Previous studies have also shown that transformers can model a wide distribution of behaviours, enabling better generalization and transfer abilities.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found