Stability of Transformers under Layer Normalization

Open in new window