Random Function Descent
–arXiv.org Artificial Intelligence
While gradient based methods are ubiquitous in machine learning, selecting the right step size often requires "hyperparameter tuning". This is because backtracking procedures like Armijo's rule depend on quality evaluations in every step, which are not available in a stochastic context. Since optimization schemes can be motivated using Taylor approximations, we replace the Taylor approximation with the conditional expectation (the best $L^2$ estimator) and propose "Random Function Descent" (RFD). Under light assumptions common in Bayesian optimization, we prove that RFD is identical to gradient descent, but with calculable step sizes, even in a stochastic context. We beat untuned Adam in synthetic benchmarks. To close the performance gap to tuned Adam, we propose a heuristic extension competitive with tuned Adam.
arXiv.org Artificial Intelligence
May-2-2023
- Country:
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America
- United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > San Diego County
- San Diego (0.04)
- New York > New York County
- Canada
- Alberta (0.14)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Genre:
- Research Report (0.50)
- Industry:
- Education > Educational Setting > Online (0.93)
- Technology: