HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Open in new window