The Price of Adaptivity in Stochastic Convex Optimization

Carmon, Yair, Hinder, Oliver

arXiv.org Machine Learning 

Stochastic optimization methods in modern machine learning often require carefully tuning sensitive algorithmic parameters at significant cost in time, computation, and expertise. This reality has led to sustained interest in developing adaptive (or parameter-free) algorithms that require minimal or no tuning [6, 8, 12, 21, 22, 24, 26, 29, 35-39, 43, 45-47]. However, a basic theoretical question remains open: Are existing methods "as adaptive as possible," or is there substantial room for improvement? Put differently, is there a fundamental price to be paid (in terms of rate of convergence) for not knowing the problem parameters in advance? To address these questions, we must formally define what it means for an adaptive algorithm to be efficient. The standard notion of minimax optimality [1] does not suffice, since it does not constrain the algorithm to be agnostic to the parameters defining the function class; stochastic gradient descent (SGD) is in many cases minimax optimal, but its step size requires problemspecific tuning. To motivate our solution, we observe that guarantees for adaptive algorithms admit the following interpretation: assuming that the input problem satisfies certain assumptions (e.g., Lipschitz continuity, smoothness, etc.) the adaptive algorithm attains performance close to the best performance that is possible to guarantee given only these assumptions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found