finite size effect
We kindly thank the reviewers (R1
We discuss the main points below. Why in Figure 1 the Hessian is not marginal before the transition? We will add in the supplementary a version of Fig.2 for a value of We will add clarifications in the text when it can give rise to confusion. Above the threshold this algorithm can find the solution. Why threshold energy is higher for more samples?
Power Laws in Deep Learning 2: Universality
Editor's note: You can read the previous post in this series, Power Laws in Deep Learning, here. In a previous post, we saw that the Fully Connected (FC) layers of the most common pre-trained Deep Learning display power law behavior. Remarkably, the FC matrices all lie within the Universality Class of Fat Tailed Random Matrices! We define a random matrix by defining a matrix of size, and drawing the matrix elements from a random distribution. In either case, Random Matrix Theory tells us what the asymptotic form of ESD should look like. But first, let's see what model works best.