6b7375226d4742ff910618a56ae72b7d-Paper-Conference.pdf

Neural Information Processing Systems 

Nevertheless, the following questions still remain very relevant: 1. Large LRs are preferred but how large are we talking about? 2. What are the key characteristics of the models trained with different LRs?