Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Chen, Baiyuan, Ito, Shinji, Imaizumi, Masaaki

arXiv.org Machine Learning 

Transformers have emerged as a powerful class of sequence models with remarkable expressive capabilities. Originally popularized in the context of natural language processing, they leverage self-attention mechanisms to in-context learn new tasks without any parameter updates (Vaswani, 2017; Liu et al., 2021; Dosovitskiy, 2020; Yun et al., 2019; Dong et al., 2018). In other words, a large transformer model can be given a prompt consisting of example input-output pairs for an unseen task and subsequently produce correct outputs for new queries of that task, purely by processing the sequence of examples and queries (Lee et al., 2022; Laskin et al., 2022; Yang et al., 2023; Lin et al., 2024). This ability to dynamically adapt via context rather than gradient-based fine-tuning has spurred extensive interest in understanding the theoretical expressivity of transformers and how they might learn algorithms internally during training. Recent theoretical work has begun to analyze the various aspects of transformers.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found