Power-law Dynamic arising from machine learning

Chen, Wei, Du, Weitao, Ma, Zhi-Ming, Meng, Qi

arXiv.org Artificial Intelligence 

We successfully train deep neural networks (DNN) and achieve big breakthroughs in AI tasks, such as computer vision [7, 8, 14], speech recognition [21, 23, 24] and natural language processing [5, 26, 27], etc. Stochastic gradient descent (SGD) is a mainstream optimization algorithm in deep machine learning. Specifically, in each iteration, SGD randomly sample a mini batch of data and update the model by the stochastic gradient. For large DNN models, the gradient computation over each instance is costly. Thus, compared to gradient descent which updates the model by the gradient over the full batch data, SGD can train DNN much more efficiently.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found