Learning Provably Improves the Convergence of Gradient Descent

Jun-16-2026, 04:40:24 GMT–Neural Information Processing Systems

However, L2O lacks rigorous theoretical backing for its own training convergence, as existing analyses often use unrealistic assumptions--a gap this work highlights empirically. We bridge this gap by proving the training convergence of L2O models that learn Gradient Descent (GD) hyperparameters for quadratic programming, leveraging the Neural Tangent Kernel (NTK) theory. We propose a deterministic initialization strategy to support our theoretical results and promote stable training over extended optimization horizons by mitigating gradient explosion. Our L2O framework demonstrates over 50% better optimality than GD and superior robustness over state-of-the-art L2O methods on synthetic datasets.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Jun-16-2026, 04:40:24 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.27)
- North America > United States
  - New York (0.27)

Genre:
- Workflow (0.67)
- Instructional Material (0.67)
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.92)
    - Statistical Learning > Gradient Descent (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found