Peking University Song Mei Salesforce AI Research

May-28-2025, 16:41:29 GMT–Neural Information Processing Systems

Attention layers--which map a sequence of inputs to a sequence of outputs--are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with activation function replaced by ReLU, which have recently shown comparable performance with the original Softmax activation. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages over standard fully connected random-feature models; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer. Additionally, we show that the sampling distribution of the query-key matrix (the product of the query and key matrix) matters--A biased Gaussian random matrix results in better sample complexities over the standard zero-mean counterpart for learning certain natural target functions.Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.

arxiv preprint arxiv, machine learning, natural language, (15 more...)

Neural Information Processing Systems

May-28-2025, 16:41:29 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.27)

Genre:
- Research Report (0.34)

Industry:
- Information Technology > Software (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
Peking University Song Mei Salesforce AI Research
274db6bf1b01d8b4f07feaeb8c46f474-Paper-Conference.pdf
274db6bf1b01d8b4f07feaeb8c46f474-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found