High Dimensional Optimization through the Lens of Machine Learning

Dec-31-2021–arXiv.org Machine Learning

This thesis reviews numerical optimization methods with machine learning problems in mind. Since machine learning models are highly parametrized, we focus on methods suited for high dimensional optimization. We build intuition on quadratic models to figure out which methods are suited for non-convex optimization, and develop convergence proofs on convex functions for this selection of methods. With this theoretical foundation for stochastic gradient descent and momentum methods, we try to explain why the methods used commonly in the machine learning field are so successful. Besides explaining successful heuristics, the last chapter also provides a less extensive review of more theoretical methods, which are not quite as popular in practice. So in some sense this work attempts to answer the question: Why are the default Tensorflow optimizers included in the defaults?

convergence, eigenvalue, gradient, (13 more...)

arXiv.org Machine Learning

Dec-31-2021

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.04)
- North America
  - United States
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - New York > New York County
      - New York City (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia
  - Russia (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel (0.04)

Genre:
- Research Report (0.50)
- Instructional Material (0.45)
- Summary/Review (0.33)

Industry:
- Education
  - Educational Setting > Online (1.00)
  - Educational Technology > Educational Software
    - Computer Based Training (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Optimization (1.00)
    - Mathematical & Statistical Methods (1.00)
  - Machine Learning
    - Neural Networks (1.00)
    - Statistical Learning > Gradient Descent (0.70)