Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
–Neural Information Processing Systems
Transformers with linearised attention ("linear Transformers") have demonstrated However, the original FWP formulation is more general than the one of linear Transformers: a slow neural network (NN) continually reprograms the weights of a fast NN with arbitrary architecture. In existing linear Transformers, both NNs are feedforward and consist of a single layer.
Neural Information Processing Systems
Aug-14-2025, 05:32:23 GMT
- Country:
- Asia > Middle East
- Saudi Arabia > Mecca Province > Thuwal (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Switzerland (0.04)
- Germany
- Lower Saxony > Gottingen (0.04)
- North Rhine-Westphalia > Upper Bavaria
- Munich (0.04)
- France > Hauts-de-France
- Italy > Tuscany
- Florence (0.04)
- Portugal > Porto
- Porto (0.04)
- Austria > Styria
- Graz (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Belgium > Brussels-Capital Region
- North America
- Dominican Republic (0.04)
- United States
- Arizona > Maricopa County
- Phoenix (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Los Angeles (0.14)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Arizona > Maricopa County
- Oceania > Australia
- Asia > Middle East
- Industry:
- Education (0.46)
- Leisure & Entertainment > Games
- Computer Games (0.46)
- Technology: