The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
–Neural Information Processing Systems
Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human liferelated tasks. The attention mechanism in the Transformer architecture is a critical component of LLMs, as it allows the model to selectively focus on specific input parts. The softmax unit, which is a key part of the attention mechanism, normalizes the attention scores. Hence, the performance of LLMs in various NLP tasks depends significantly on the crucial role played by the attention mechanism with the softmax unit. In-context learning is one of the celebrated abilities of recent LLMs. Without further parameter updates, Transformers can learn to predict based on few in-context examples. However, the reason why Transformers becomes in-context learners is not well understood. Recently, in-context learning has been studied from a mathematical perspective with simplified linear self-attention without softmax unit.
Neural Information Processing Systems
May-25-2025, 05:07:25 GMT
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.93)
- Research Report
- Industry:
- Education (0.67)
- Technology: