Scaling White-Box Transformers for Vision Jinrui Yang 1 Xianhang Li1 Yuyin Zhou

May-29-2025, 07:06:58 GMT–Neural Information Processing Systems

Over the past several years, the Transformer architecture [42] has dominated deep representation learning for natural language processing (NLP), image processing, and visual computing [8, 2, 9, 5, 12]. However, the design of the Transformer architecture and its many variants remains largely empirical and lacks a rigorous mathematical interpretation. This has largely hindered the development of new Transformer variants with improved efficiency or interpretability.

crate, large language model, machine learning, (21 more...)

Neural Information Processing Systems

May-29-2025, 07:06:58 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.14)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.93)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)
    - Natural Language > Large Language Model (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)