Boundary between noise and information applied to filtering neural network weight matrices

Staats, Max, Thamm, Matthias, Rosenow, Bernd

arXiv.org Artificial Intelligence 

Institut für Theoretische Physik, Universität Leipzig, Brüderstrasse 16, 04103 Leipzig, Germany (Dated: June 9, 2022) Deep neural networks have been successfully applied to a broad range of problems where overparametrization yields weight matrices which are partially random. A comparison of weight matrix singular vectors to the Porter-Thomas distribution suggests that there is a boundary between randomness and learned information in the singular value spectrum. Inspired by this finding, we introduce an algorithm for noise filtering, which both removes small singular values and reduces the magnitude of large singular values to counteract the effect of level repulsion between the noise and the information part of the spectrum. For networks trained in the presence of label noise, we indeed find that the generalization performance improves significantly due to noise filtering. Introduction: In recent years, deep neural networks small singular values agree with the RMT prediction, (DNNs) have proven to be powerful tools for solving a while vectors which large singular values significantly deviate.