Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Open in new window