Directional Smoothness and Gradient Methods: Convergence and Adaptivity

May-28-2025, 15:42:31 GMT–Neural Information Processing Systems

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on L-smoothness.

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

May-28-2025, 15:42:31 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America
  - Canada > Quebec (0.14)
  - United States
    - California (0.14)
    - Maryland (0.14)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning (0.87)
  - Representation & Reasoning > Optimization (1.00)