Power-law Dynamic arising from machine learning

Chen, Wei, Du, Weitao, Ma, Zhi-Ming, Meng, Qi

Jun-16-2023–arXiv.org Artificial Intelligence

We successfully train deep neural networks (DNN) and achieve big breakthroughs in AI tasks, such as computer vision [7, 8, 14], speech recognition [21, 23, 24] and natural language processing [5, 26, 27], etc. Stochastic gradient descent (SGD) is a mainstream optimization algorithm in deep machine learning. Specifically, in each iteration, SGD randomly sample a mini batch of data and update the model by the stochastic gradient. For large DNN models, the gradient computation over each instance is costly. Thus, compared to gradient descent which updates the model by the gradient over the full batch data, SGD can train DNN much more efficiently.

artificial intelligence, machine learning, stationary distribution, (15 more...)

arXiv.org Artificial Intelligence

Jun-16-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning > Gradient Descent (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found