Goto

Collaborating Authors

 neural information processing system 32





Proxy-NormalizingActivationstoMatchBatch NormalizationwhileRemovingBatchDependence

Neural Information Processing Systems

We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces alackofvariability ininstance statistics, symptomatic ofanalteration of theexpressivity.