BranchNorm: Robustly Scaling Extremely Deep Transformers

Open in new window