Random Matrix Theory for Stochastic Gradient Descent

Park, Chanju, Favoni, Matteo, Lucini, Biagio, Aarts, Gert

arXiv.org Artificial Intelligence 

Machine learning (ML) and artificial intelligence (AI) can provide powerful tools for the scientific community, as demonstrated by the recent Nobel Prize in Chemistry. Reversely, insights from traditional physics theories also contribute to a deeper understanding of the mechanism of learning. Ref. [1] contains a broad overview of the successful cross-fertilisation between ML and the physical sciences, covering a number of domains. One way to mitigate against possible scepticism with regard to using ML as a "black box" is by unveiling the dynamics of training (or learning) and explaining how the relevant information is engraved in the model during the training stage. To further develop this programme, we study here the dynamics of first-order stochastic gradient descent as applied to weight matrices, reporting and expanding on the work presented in Ref. [2]. When training ML models, weight matrices are commonly updated by one of the variants of the stochastic gradient descent algorithm. The dynamics can then be decomposed into a drift and a fluctuating term, and such a system can be described by a discrete Langevin equation. The dynamics of stochastic matrix updates is richer than the dynamics for vector or scalar quantities, as captured by Dyson Brownian motion and random matrix theory (RMT), with the appearance of universal features for the eigenvalues [3-9]. Earlier descriptions of the statistical properties of weight matrices in terms of RMT can be found in e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found