a7453a5f026fb6831d68bdc9cb0edcae-AuthorFeedback.pdf
–Neural Information Processing Systems
We thank reviewers for their thorough reading. We will fix the typos and clarify the unclear points in the next version of our paper. Batch size has been an important component of past analyses. When the nets are without BN, e.g. with LN or GN, the magnitude However, this analysis doesn't hold for the general case where BN is allowed and thus we treat batch size as a fixed hyper-parameter The fast equilibrium conjecture only partially explains the benefits of BN. Besides this conjecture, there are many other benefits, e.g., BN affects the If we make the second phase longer, one should expect the ratio becomes closer to 10. 2. Figure 10 gives a more clear and However, this is not observed in any of our settings, so it's not clear to us whether the heavy tail assumption holds for our setting.
Neural Information Processing Systems
Jun-2-2025, 13:14:08 GMT
- Technology: