[1609.04836v1] On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima • /r/MachineLearning
Have you seen the work of Friedlander and Schmidt, and my follow up paper (shameless plug, toot toot)? Though our analysis is restricted to convex functions, there is also a notion of "sharpness" of minima which is appears as the condition number of the problem.
Sep-20-2016, 07:05:21 GMT
- Technology: