Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers Lorenzo Tiberi 1,2 Francesca Mignacco
–Neural Information Processing Systems
Second, generalization--what specific aspects of the transformer architecture are responsible for their effective learning?
Neural Information Processing Systems
Feb-16-2026, 07:50:01 GMT
- Country:
- Africa > Rwanda
- Asia > Middle East
- Israel > Jerusalem District > Jerusalem (0.04)
- Europe
- Austria > Vienna (0.14)
- France (0.04)
- Italy > Tuscany
- Florence (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Canada (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Diego County > San Diego (0.04)
- District of Columbia > Washington (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Baltimore (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas > Travis County
- Austin (0.04)
- California
- Genre:
- Research Report > Experimental Study (0.92)
- Technology: