AITopics | single-head attention

Collaborating Authors

single-head attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-GranularityCross-modalAlignmentfor GeneralizedMedicalVisualRepresentationLearning (SupplementaryMaterial)

Neural Information Processing SystemsFeb-12-2026, 05:50:44 GMT

We use the open-source mimic-cxr repository4 to extract impression and findings for each report. Following [9], we pick out sequences of alphanumeric characters and drop all other characters and symbols for all reports, and remove reports which contain less than3 tokens. Following common practice in ViT [5], we split the radiograph with patch size16 16,which results in 196 visual tokens for each image. The instance-level projection layer is a two-layer MultiLayer Perceptron (MLP) with Batch Normalization [10] and ReLU activation function. Additionally, we use a frozen Batch Normalization layer after the MLP toobtain instance-levelembeddings.

artificial intelligence, machine learning, supplementarymaterial, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Nuclear Medicine (0.49)
Health & Medicine > Diagnostic Medicine > Imaging (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

fecc3a370a23d13b1cf91ac3c1e1ca92-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 11:27:20 GMT

attention model, attention step, table 1, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.35)

Add feedback

Superiority of Multi-Head Attention in In-Context Linear Regression

Cui, Yingqian, Ren, Jie, He, Pengfei, Tang, Jiliang, Xing, Yue

arXiv.org Artificial IntelligenceJan-30-2024

We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention. When the number of in-context examples D increases, the prediction loss using single- /multi-head attention is in O (1 /D), and the one for multi-head attention has a smaller multiplicative constant. In addition to the simplest data distribution setting, we consider more scenarios, e.g., noisy labels, local examples, correlated features, and prior knowledge. We observe that, in general, multi-head attention is preferred over single-head attention. Our results verify the effectiveness of the design of multi-head attention in the transformer architecture.

exp, multi-head attention, single-head attention, (12 more...)

arXiv.org Artificial Intelligence

2401.17426

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback