Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise

Open in new window