HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Jun-11-2026, 05:32:41 GMT–Neural Information Processing Systems

Transformers have become the de facto architecture for a wide range of machine learning tasks, particularly in large language models (LLMs). Despite their remarkable performance, many challenges remain in training deep transformer networks, especially regarding the position of the layer normalization. While Pre-Norm structures facilitate more stable training owing to their stronger identity path, they often lead to suboptimal performance compared to Post-Norm.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Jun-11-2026, 05:32:41 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.98)
  - Natural Language > Large Language Model (0.60)