ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Lu, Jiecheng, Han, Xu, Sun, Yan, Pati, Viresh, Kim, Yubin, Somani, Siddhartha, Yang, Shihao

Feb-6-2026–arXiv.org Machine Learning

Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction to convex combinations that only permits additive information blending, and uniform accumulated weight bias that dilutes attention in long contexts. We propose Zero-Sum Linear Attention (ZeroS), which addresses these limitations by removing the constant zero-order term $1/t$ and reweighting the remaining zero-sum softmax residuals. This modification creates mathematically stable weights, enabling both positive and negative values and allowing a single attention layer to perform contrastive operations. While maintaining $O(N)$ complexity, ZeroS theoretically expands the set of representable functions compared to convex combinations. Empirically, it matches or exceeds standard softmax attention across various sequence modeling benchmarks.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Machine Learning

Feb-6-2026

arXiv.org PDF

Add feedback

Country:
- Europe
  - Germany (0.04)
  - Switzerland (0.04)
- North America > United States
  - Michigan > Washtenaw County > Ann Arbor (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Energy (0.46)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.93)
    - Statistical Learning (0.93)
  - Natural Language > Large Language Model (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found