On the Overlooked Structure of Stochastic Gradients

Jan-19-2025, 22:49:37 GMT–Neural Information Processing Systems

Stochastic gradients closely relate to both optimization and generalization of deep neural networks (DNNs). Some works attempted to explain the success of stochastic optimization for deep learning by the arguably heavy-tail properties of gradient noise, while other works presented theoretical and empirical evidence against the heavy-tail hypothesis on gradient noise. Unfortunately, formal statistical tests for analyzing the structure and heavy tails of stochastic gradients in deep learning are still under-explored. In this paper, we mainly make two contributions. First, we conduct formal statistical tests on the distribution of stochastic gradients and gradient noise across both parameters and iterations.

deep learning, gradient noise, overlooked structure, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 22:49:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (1.00)
  - Machine Learning
    - Statistical Learning > Gradient Descent (1.00)
    - Neural Networks > Deep Learning (0.95)