894403f9604374a7a003063e480f65b9-Paper-Conference.pdf
–Neural Information Processing Systems
Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of retrieval and copying tasks inspired by Liu et al. [2024a]. We use a recently proposed framework for studying length generalization [Huang et al., 2025] to provide guarantees for each of our settings.
Neural Information Processing Systems
Jun-19-2026, 11:31:29 GMT
- Country:
- Europe > Austria (0.28)
- North America > United States (0.27)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Health & Medicine (0.46)
- Technology: