Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning
Chen, Baiyuan, Ito, Shinji, Imaizumi, Masaaki
Transformers have emerged as a powerful class of sequence models with remarkable expressive capabilities. Originally popularized in the context of natural language processing, they leverage self-attention mechanisms to in-context learn new tasks without any parameter updates (Vaswani, 2017; Liu et al., 2021; Dosovitskiy, 2020; Yun et al., 2019; Dong et al., 2018). In other words, a large transformer model can be given a prompt consisting of example input-output pairs for an unseen task and subsequently produce correct outputs for new queries of that task, purely by processing the sequence of examples and queries (Lee et al., 2022; Laskin et al., 2022; Yang et al., 2023; Lin et al., 2024). This ability to dynamically adapt via context rather than gradient-based fine-tuning has spurred extensive interest in understanding the theoretical expressivity of transformers and how they might learn algorithms internally during training. Recent theoretical work has begun to analyze the various aspects of transformers.
Aug-25-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.82)
- Technology: