Reviews: Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence
–Neural Information Processing Systems
Theory-wise, the authors overlooked to discuss several prior works, some of which suggested opposite theories to theirs. For example: - "Don't Decay the Learning Rate, Increase the Batch Size", ICLR'18, seems to support a constant batch size/lr ratio empirically --- after rebuttal --- After reading the comments and the authors rebuttal, I am satisfied with the responses. The paper theoretically verifies that the ratio of batch size to learning rate is positively related to the generalization error. Specifically, it verifies some very recent empirical findings, e.g., Don't decay the learning rate, increase the batch size, ICLR 2018, which empirically states that increasing the batch size and decaying the learning rate are quantitatively equivalent. I think the theoretical result is novel and timely and would interest many readers in the deep learning community.
Neural Information Processing Systems
Jan-27-2025, 12:52:29 GMT
- Technology: