$\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It

Oct-7-2021–arXiv.org Machine Learning

It is difficult to optimize a giant model with deep and wider layers. Similar to most optimization algorithms, training a deep model with gradient method (SGD-like Algorithms) has disadvantages such as easy to fall into local minima or saddle point and slow convergence speed. There have been a lot of researches on the improvement of the gradient method, and a considerable part of these researches focus on how to refine the search direction while keeping the iteration cost as low as possible to accelerate the convergence of the algorithm[10, 11, 12, 13, 14, 15, 16]. These improvements for the search direction are roughly divided into two categories. One is the momentum method[11] based on the principles of physics and the corresponding improved algorithms[12, 20, 21], the momentum method avoids excessive swing amplitude of the search track by retaining part of the potential energy of the original track to accelerate the convergence.

algorithm, iteration, mst, (14 more...)

arXiv.org Machine Learning

Oct-7-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Nevada (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China
  - Guangdong Province > Guangzhou (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.70)
  - Natural Language > Machine Translation (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)