The Price of Adaptivity in Stochastic Convex Optimization

Mar-13-2024–arXiv.org Machine Learning

Stochastic optimization methods in modern machine learning often require carefully tuning sensitive algorithmic parameters at significant cost in time, computation, and expertise. This reality has led to sustained interest in developing adaptive (or parameter-free) algorithms that require minimal or no tuning [6, 8, 12, 21, 22, 24, 26, 29, 35-39, 43, 45-47]. However, a basic theoretical question remains open: Are existing methods "as adaptive as possible," or is there substantial room for improvement? Put differently, is there a fundamental price to be paid (in terms of rate of convergence) for not knowing the problem parameters in advance? To address these questions, we must formally define what it means for an adaptive algorithm to be efficient. The standard notion of minimax optimality [1] does not suffice, since it does not constrain the algorithm to be agnostic to the parameters defining the function class; stochastic gradient descent (SGD) is in many cases minimax optimal, but its step size requires problemspecific tuning. To motivate our solution, we observe that guarantees for adaptive algorithms admit the following interpretation: assuming that the input problem satisfies certain assumptions (e.g., Lipschitz continuity, smoothness, etc.) the adaptive algorithm attains performance close to the best performance that is possible to guarantee given only these assumptions.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

Mar-13-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.54)
  - Representation & Reasoning
    - Optimization (0.88)
    - Search (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found