Dual Averaging is Surprisingly Effective for Deep Learning Optimization

Oct-20-2020–arXiv.org Machine Learning

Stochastic first-order optimization methods have been extensively employed for training neural networks. It has been empirically observed that the choice of the optimization algorithm is crucial for obtaining a good accuracy score. For instance, stochastic variance-reduced methods perform poorly in computer vision (CV) (Defazio & Bottou, 2019). On the other hand, SGD with momentum (SGD M) (Bottou, 1991; LeCun et al., 1998; Bottou & Bousquet, 2008) works particularly well on CV tasks and Adam (Kingma & Ba, 2014) outperforms other methods on natural language processing (NLP) tasks (Choi et al., 2019). In general, the choice of optimizer, as well as its hyper-parameters, must be included among the set of hyper-parameters that are searched over when tuning.

deep learning, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

Oct-20-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found