SystolicAttention: Fusing FlashAttention within a Single Systolic Array
Lin, Jiawei, Li, Yuanlong, Chen, Guokai, Bourgeat, Thomas
–arXiv.org Artificial Intelligence
Transformer models rely heavily on the scaled dot-product attention (SDPA) operation, typically implemented as FlashAttention. Characterized by its frequent interleaving of matrix multiplications and softmax operations, FlashAttention fails to fully utilize the compute resources of modern systolic-array-based accelerators designed for consecutive and large matrix multiplications. To fully unleash the performance potential of systolic arrays for FlashAttention, we propose FSA, an enhanced systolic array architecture that runs the entire FlashAttention on the array without external vector units. Combined with SystolicAttention, an optimized kernel for FSA that achieves fine-grained and element-wise overlapping of FlashAttention operations, FSA maximizes array utilization while preserving the original floating-point operation order of FlashAttention. We implement FSA in synthesizable RTL and evaluate its performance against state-of-the-art systolic-array-based accelerators. Our results show that FSA achieves 1.77x and 4.83x higher attention FLOPs/s utilization compared to AWS Neuron-v2 and Google TPUv5e, respectively. We synthesize FSA in a 16 nm technology at 1.5 GHz, and results indicate only a 12% area overhead compared to a standard weight-stationary systolic array.
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Asia
- China > Chongqing Province
- Chongqing (0.04)
- Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- China > Chongqing Province
- Europe
- Netherlands > South Holland
- Rotterdam (0.04)
- Switzerland > Vaud
- Lausanne (0.76)
- United Kingdom > Scotland
- City of Glasgow > Glasgow (0.04)
- Netherlands > South Holland
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Diego County > La Jolla (0.04)
- San Francisco County > San Francisco (0.14)
- Florida > Orange County
- Orlando (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- New York > New York County
- New York City (0.05)
- California
- Canada
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Technology: