A unified framework for establishing the universal approximation of transformer-type architectures
Cheng, Jingpu, Lin, Ting, Shen, Zuowei, Li, Qianxiao
–arXiv.org Artificial Intelligence
We investigate the universal approximation property (UAP) of transformer-type architectures, providing a unified theoretical framework that extends prior results on residual networks to models incorporating attention mechanisms. Our work identifies token distinguishability as a fundamental requirement for UAP and introduces a general sufficient condition that applies to a broad class of architectures. Leveraging an analyticity assumption on the attention layer, we can significantly simplify the verification of this condition, providing a non-constructive approach in establishing UAP for such architectures. We demonstrate the applicability of our framework by proving UAP for transformers with various attention mechanisms, including kernel-based and sparse attention mechanisms. The corollaries of our results either generalize prior works or establish UAP for architectures not previously covered. Furthermore, our framework offers a principled foundation for designing novel transformer architectures with inherent UAP guarantees, including those with specific functional symmetries. We propose examples to illustrate these insights.
arXiv.org Artificial Intelligence
Oct-22-2025
- Country:
- Asia > Singapore (0.04)
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Italy > Calabria
- South America > Chile
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: