Understanding the Failure of Batch Normalization for Transformers in NLP Jiaxi Wang 1, Ji Wu1,2, Lei Huang 3 1 Department of Electronic Engineering, Tsinghua University

Open in new window