We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

Aug-21-2023–arXiv.org Artificial Intelligence

Deep learning has become a pivotal technology across various domains including natural language processing, computer vision, speech recognition, and medical diagnostics [1, 2]. Deep neural networks (DNNs), characterized by multiple hidden layers, have shown unparalleled success in learning complex patterns from large-scale data. However, the training of these models requires the fine-tuning of millions or even billions of parameters, which presents significant optimisation challenges [3-7]. A large body of research has focused on optimisation techniques to enhance the convergence speed, stability, and generalisation capability of deep models. Conventional techniques like Stochastic Gradient Descent (SGD) [8] and its variations including Momentum [9], Adagrad [10], RMSprop [11], and Adam [12] have been widely used.

artificial intelligence, learning rate, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Aug-21-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > California
    - San Diego County > San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Europe
  - Russia (0.04)
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Russia (0.04)
  - Middle East > Lebanon (0.04)
- Africa > Middle East
  - Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning > Gradient Descent (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found