Stochastic Gradient Descent and Anomaly of Variance-flatness Relation in Artificial Neural Networks

Xiong, Xia, Chen, Yong-Cong, Shi, Chunxiao, Ao, Ping

Jun-12-2023–arXiv.org Artificial Intelligence

Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks has attracted continuing studies for the theoretical principles behind its success. A recent work reports an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng & Tu, PNAS 118, 0027 (2021)]. To investigate this seemingly violation of statistical physics principle, the properties of SGD near fixed points are analysed via a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly.

energy function, loss function, matrix, (13 more...)

arXiv.org Artificial Intelligence

Jun-12-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Netherlands
  - North Holland > Amsterdam (0.04)
- Asia > China
  - Shanghai > Shanghai (0.05)
  - Sichuan Province > Chengdu (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found