SurprisingInstabilitiesinTrainingDeepNetworksand aTheoreticalAnalysis

Neural Information Processing Systems 

Learning rates and schedules are often chosen heuristically to address this trade-off.