Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks SUPPLEMENTARY DOCUMENT
–Neural Information Processing Systems
In Section S4, the relation between compressibility and the tail index is discussed. Proofs of the main results of the paper are presented in Section S5. Finally, the technical lemmas are proved in Section S6. Here we provide a more detailed explanation for our experimental setting, as well as the results and discussion we omitted from the main paper due to space restrictions. Table 1 includes the number of parameters for each model-dataset combination.
Neural Information Processing Systems
Aug-18-2025, 22:17:01 GMT