Proxy-NormalizingActivationstoMatchBatch NormalizationwhileRemovingBatchDependence

Neural Information Processing Systems 

We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces alackofvariability ininstance statistics, symptomatic ofanalteration of theexpressivity.