Reviews: First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
–Neural Information Processing Systems
For Reviewer #1's concern about making theory, I tend to be open-minded since I can not find solid evidence that the paper is making theory only. For Reviewer #4's comment about the over-claim of the result the paper proved, my take is follows. First, for many problems, the true local minima enjoys the flat basin. A famous example I have is the following paper: McGoff, Kevin A., et al. "The Local Edge Machine: inference of dynamic models of gene regulation." Second, the authors have explained the motivation of using the Levy process to model the noise.
Neural Information Processing Systems
Jan-26-2025, 08:24:01 GMT
- Industry:
- Technology: