FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

May-25-2025, 06:43:56 GMT–Neural Information Processing Systems

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-25-2025, 06:43:56 GMT

Conferences PDF

Country:
- North America > United States (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)