Reviews: Asymmetric Valleys: Beyond Sharp and Flat Local Minima
–Neural Information Processing Systems
Summary: The authors analyse the energy landscape associated with the training of deep neural networks and introduce the concept of Asymmetric Valleys (AV), local minima that cannot be classified as sharp or flat local minima. AV are characterized by the presence of asymmetric directions along which the loss increases abruptly on one side and is almost flat on the other. The presence of AV in commonly used architectures is proven empirically by showing that asymmetric directions can be found with decent probability'. The authors explain why SGD, with averaged updates, behaves well (in terms of the generalization properties of the trained model) in the proximity of AV. Strengths: The study of neural networks' energy landscape is a recent important topic.
Neural Information Processing Systems
Jan-21-2025, 04:55:19 GMT
- Technology: