SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling

Jun-12-2026, 04:07:58 GMT–Neural Information Processing Systems

Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic complexity hinders efficiency and scalability, especially for long-context processing. A promising approach is to leverage sparsity in attention. However, existing sparsity-based solutions predominantly rely on predefined patterns or heuristics at the attention head level, struggling to adapt dynamically to different contexts efficiently. We propose SeerAttention, a simple yet effective attention mechanism that directly learns the block-level attention sparsity from the LLM itself.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Jun-12-2026, 04:07:58 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.39)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)