Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
–Neural Information Processing Systems
This paper provides a simple explanation for why batch normalized deep residual networks are easily trainable.
Neural Information Processing Systems
Aug-17-2025, 01:39:55 GMT