Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks SUPPLEMENTARY DOCUMENT