Scaling White-Box Transformers for Vision Jinrui Yang 1 Xianhang Li1 Yuyin Zhou
–Neural Information Processing Systems
Over the past several years, the Transformer architecture [42] has dominated deep representation learning for natural language processing (NLP), image processing, and visual computing [8, 2, 9, 5, 12]. However, the design of the Transformer architecture and its many variants remains largely empirical and lacks a rigorous mathematical interpretation. This has largely hindered the development of new Transformer variants with improved efficiency or interpretability.
Neural Information Processing Systems
May-29-2025, 07:06:58 GMT
- Country:
- Europe > Switzerland > Zürich > Zürich (0.14)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology (0.46)