Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent

Jan-18-2025, 11:47:27 GMT–Neural Information Processing Systems

Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the step-size should be chosen sufficiently large, a task which is problem dependent and can be difficult in practice. In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce \emph{multiscale perturbed GD} (MPGD), a novel optimization framework where the GD recursion is augmented with chaotic perturbations that evolve via an independent dynamical system. We analyze MPGD from three different angles: (i) By building up on recent advances in rough paths theory, we show that, under appropriate assumptions, as the step-size decreases, the MPGD recursion converges weakly to a stochastic differential equation (SDE) driven by a heavy-tailed L\'{e}vy-stable process. Empirical results are provided to demonstrate the advantages of MPGD.

artificial intelligence, chaotic regularization and heavy-tailed limit, machine learning, (3 more...)

Neural Information Processing Systems

Jan-18-2025, 11:47:27 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.62)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)