Fast Attention Requires Bounded Entries

Jan-19-2025, 21:57:55 GMT–Neural Information Processing Systems

In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Straightforward methods for this problem explicitly compute the n \times n attention matrix A, and hence require time \Omega(n 2) even when d n {o(1)} is small. In this paper, we investigate whether faster algorithms are possible by \emph{implicitly} making use of the matrix A . We present two results, showing that there is a sharp transition at B \Theta(\sqrt{\log n}) .

fast attention require bounded entry, mathrm, matrix, (8 more...)

Neural Information Processing Systems

Jan-19-2025, 21:57:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)