WhenDoFlatMinimaOptimizers Work?
–Neural Information Processing Systems
Theoretical and empirical studies [21,77,9,55,49,5,12]postulate that such flatter regions generalize better than sharper minima, e.g., due to the flat minimizer's robustness against loss function shifts between trainandtestdata,asillustrated inFig.1.
Neural Information Processing Systems
Feb-9-2026, 14:06:50 GMT
- Country:
- Africa > Ethiopia (0.04)
- Asia > South Korea
- Europe
- Austria (0.04)
- France (0.05)
- Sweden > Stockholm
- Stockholm (0.04)
- United Kingdom > England
- North Yorkshire > York (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.05)
- British Columbia > Metro Vancouver Regional District
- United States
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- California
- Los Angeles County > Long Beach (0.14)
- Monterey County > Monterey (0.04)
- San Diego County > San Diego (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Washington > King County
- Seattle (0.04)
- Alaska > Anchorage Municipality
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.47)
- Technology: