Spreads in Effective Learning Rates: The Perils of Batch Normalization During Early Training

Open in new window