Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
–Neural Information Processing Systems
This paper provides a simple explanation for why batch normalized deep residual networks are easily trainable.
Neural Information Processing Systems
Nov-15-2025, 13:09:29 GMT