The Implicit Bias of Minima Stability: A View from Function Space

Jan-17-2025, 14:18:49 GMT–Neural Information Processing Systems

The loss terrains of over-parameterized neural networks have multiple global minima. However, it is well known that stochastic gradient descent (SGD) can stably converge only to minima that are sufficiently flat w.r.t. In this paper we study the effect that this mechanism has on the function implemented by the trained model. First, we extend the existing knowledge on minima stability to non-differentiable minima, which are common in ReLU nets. We then use our stability results to study a single hidden layer univariate ReLU network.

artificial intelligence, machine learning, minima stability, (4 more...)

Neural Information Processing Systems

Jan-17-2025, 14:18:49 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.63)
  - Neural Networks (0.63)