The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

May-25-2025, 05:07:25 GMT–Neural Information Processing Systems

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human liferelated tasks. The attention mechanism in the Transformer architecture is a critical component of LLMs, as it allows the model to selectively focus on specific input parts. The softmax unit, which is a key part of the attention mechanism, normalizes the attention scores. Hence, the performance of LLMs in various NLP tasks depends significantly on the crucial role played by the attention mechanism with the softmax unit. In-context learning is one of the celebrated abilities of recent LLMs. Without further parameter updates, Transformers can learn to predict based on few in-context examples. However, the reason why Transformers becomes in-context learners is not well understood. Recently, in-context learning has been studied from a mathematical perspective with simplified linear self-attention without softmax unit.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-25-2025, 05:07:25 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.93)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.66)
    - Statistical Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found