Random Function Descent

May-2-2023–arXiv.org Artificial Intelligence

While gradient based methods are ubiquitous in machine learning, selecting the right step size often requires "hyperparameter tuning". This is because backtracking procedures like Armijo's rule depend on quality evaluations in every step, which are not available in a stochastic context. Since optimization schemes can be motivated using Taylor approximations, we replace the Taylor approximation with the conditional expectation (the best $L^2$ estimator) and propose "Random Function Descent" (RFD). Under light assumptions common in Bayesian optimization, we prove that RFD is identical to gradient descent, but with calculable step sizes, even in a stochastic context. We beat untuned Adam in synthetic benchmarks. To close the performance gap to tuned Adam, we propose a heuristic extension competitive with tuned Adam.

artificial intelligence, machine learning, random function, (17 more...)

arXiv.org Artificial Intelligence

May-2-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America
  - United States
    - New York > New York County
      - New York City (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada
    - Alberta (0.14)
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education > Educational Setting > Online (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found