SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

Jun-9-2026, 20:13:22 GMT–Neural Information Processing Systems

We study gradient flows for loss landscapes of fully connected feedforward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove the existence of a threshold $\varepsilon> 0$ such that the loss value of any gradient flow initialized at most $\varepsilon$ above the optimal level converges to it. For polynomial target functions and sufficiently big architecture and data set, we prove that the optimal loss value is zero and can only be realized asymptotically.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Jun-9-2026, 20:13:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)