AITopics | softmax denominator

Collaborating Authors

softmax denominator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

47d40767c7e9df50249ebfd9c7cfff77-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 07:24:35 GMT

reduce memory, reformer, softmax denominator, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.34)

Add feedback

Decomposing Attention To Find Context-Sensitive Neurons

Gibson, Alex

arXiv.org Artificial IntelligenceOct-7-2025

We study transformer language models, analyzing attention heads whose attention patterns are spread out, and whose attention scores depend weakly on content. We argue that the softmax denominators of these heads are stable when the underlying token distribution is fixed. By sampling softmax denominators from a "calibration text", we can combine together the outputs of multiple such stable heads in the first layer of GPT2-Small, approximating their combined output by a linear summary of the surrounding text. This approximation enables a procedure where from the weights alone - and a single calibration text - we can uncover hundreds of first layer neurons that respond to high-level contextual properties of the surrounding text, including neurons that didn't activate on the calibration text.

large language model, machine learning, programming language, (22 more...)

arXiv.org Artificial Intelligence

2510.03315

Country:

North America > United States (1.00)
Europe > United Kingdom (1.00)
Asia > Middle East > Israel (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Television (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Law Enforcement & Public Safety (1.00)
(12 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

47d40767c7e9df50249ebfd9c7cfff77-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 20:06:19 GMT

We thank the reviewers for their valuable comments! Unclear if the proposed method is better than only using LSH. Thank you for the suggestions. ALSH significantly outperforms the E2LSH and the Reformer LSH scheme. SMYRF-BERT base (see also Table 2).

artificial intelligence, reformer, softmax denominator, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.34)

Add feedback