Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints

Sung, Kyle, Kratsios, Anastasis, Forman, Noah

Feb-6-2025–arXiv.org Machine Learning

We demonstrate that applying an eventual decay to the learning rate (LR) in empirical risk minimization (ERM), where the mean-squared-error loss is minimized using standard gradient descent (GD) for training a two-layer neural network with Lipschitz activation functions, ensures that the resulting network exhibits a high degree of Lipschitz regularity, that is, a small Lipschitz constant. Moreover, we show that this decay does not hinder the convergence rate of the empirical risk, now measured with the Huber loss, toward a critical point of the non-convex empirical risk. From these findings, we derive generalization bounds for two-layer neural networks trained with GD and a decaying LR with a sub-linear dependence on its number of trainable parameters, suggesting that the statistical behaviour of these networks is independent of overparameterization. We validate our theoretical results with a series of toy numerical experiments, where surprisingly, we observe that networks trained with constant step size GD exhibit similar learning and regularity properties to those trained with a decaying LR. This suggests that neural networks trained with standard GD may already be highly regular learners.

artificial intelligence, machine learning, neural network, (12 more...)

arXiv.org Machine Learning

Feb-6-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > New York (0.04)
  - Canada > Ontario
    - Hamilton (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found