LimitstoDepth-EfficienciesofSelf-Attention

Feb-19-2026, 09:46:33 GMT–Neural Information Processing Systems

Self-attention architectures, which are rapidly pushing the frontier innatural language processing, demonstrate asurprising depth-inefficient behavior: previous works indicate that increasing the internal representation (network width) isjust as useful as increasing the number of self-attention layers (network depth).

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Feb-19-2026, 09:46:33 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.14)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
Limits to Depth-Efficiencies of Self-Attention

Similar Docs Excel Report more

Title	Similarity	Source
None found