On the Nonlinearity of Layer Normalization

Open in new window