The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
–Neural Information Processing Systems
Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.
Neural Information Processing Systems
Oct-9-2025, 04:16:35 GMT
- Country:
- Europe
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- Switzerland > Zürich
- North America > Canada
- Europe
- Genre:
- Research Report (0.46)
- Technology: