CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers Yoshihiro Yamada Preferred Networks yyamada@preferred.jp
–Neural Information Processing Systems
Transformers have driven remarkable breakthroughs in natural language processing 2and computer vision, yet their standard attention mechanism still imposes O(N) complexity, hindering scalability to longer sequences. We introduce Circularconvolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular power. CA con T volutions achieves to O reduce (N log comple N) computations, xity without requires sacrificing fewer representational learnable parameters by streamlining fully connected layers, and introduces no additional heavy operations, resulting in consistent accuracy improvements and about a 10% speedup in naive PyTorch implementations. Based on the Engineering-Isomorphic Transformers (EITs) framework, CAT's design not only offers practical efficiency and ease of implementation, but also provides insights to guide the development of
Neural Information Processing Systems
Jun-22-2026, 22:32:29 GMT