Understanding the Implicit Regularization of Gradient Descent in Over-parameterized Models

Ma, Jianhao, Liang, Geyu, Fattahi, Salar

Dec-10-2025–arXiv.org Artificial Intelligence

Implicit regularization refers to the tendency of local search algorithms to converge to low-dimensional solutions, even when such structures are not explicitly enforced. Despite its ubiquity, the mechanism underlying this behavior remains poorly understood, particularly in over-parameterized settings. We analyze gradient descent dynamics and identify three conditions under which it converges to second-order stationary points within an implicit low-dimensional region: (i) suitable initialization, (ii) efficient escape from saddle points, and (iii) sustained proximity to the region. We show that these can be achieved through infinitesimal perturbations and a small deviation rate. Building on this, we introduce Infinitesimally Perturbed Gradient Descent (IPGD), which satisfies these conditions under mild assumptions. We provide theoretical guarantees for IPGD in over-parameterized matrix sensing and empirical evidence of its broader applicability.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

Dec-10-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- North America > United States
  - Michigan (0.04)
  - New York (0.04)
  - Pennsylvania (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)
  - Representation & Reasoning > Search (1.00)