MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention

Jun-14-2026, 10:12:19 GMT–Neural Information Processing Systems

Transformers have achieved state-of-the-art performance across various tasks, but suffer from a notable quadratic complexity in sequence length due to the attention mechanism. In this work, we propose MonarchAttention-a novel approach to sub-quadratic attention approximation via Monarch matrices, an expressive class of structured matrices. Based on the variational form of softmax, we describe an efficient optimization-based algorithm to compute an approximate projection of softmax attention onto the class of Monarch matrices with Θ(N Nd) computational complexity and Θ(Nd)memory/IO complexity.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-14-2026, 10:12:19 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found