On the Convergence of Encoder-only Shallow Transformers

Feb-16-2026, 07:01:55 GMT–Neural Information Processing Systems

Besides, neural tangent kernel (NTK) based analysis is also given, which facilitates a comprehensive comparison. Our theory demonstrates the separation on the importance of different scaling schemes and initialization.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Feb-16-2026, 07:01:55 GMT

Conferences PDF

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.04)
  - Oregon > Multnomah County
    - Portland (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
On the Convergence of Encoder-only Shallow Transformers

Similar Docs Excel Report more

Title	Similarity	Source
None found