The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Oct-9-2025, 04:16:35 GMT–Neural Information Processing Systems

Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.

covariance, neural network, transformer, (17 more...)

Neural Information Processing Systems

Oct-9-2025, 04:16:35 GMT

Conferences PDF

Country:
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Zürich
    - Zürich (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning (0.67)
  - Machine Learning
    - Statistical Learning (0.95)
    - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
aa31dc84098add7dd2ffdd20646f2043-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found