Gradient Shaping Beyond Clipping: A Functional Perspective on Update Magnitude Control
–arXiv.org Artificial Intelligence
Gradient clipping is widely used to stabilize deep network training, but its formulation as a hard, fixed threshold limits flexibility and ignores gradient distribution dynamics. We propose SPAMP (Statistical Per-layer Adaptive Modulation and Projection), a unified framework that generalizes clipping into smooth, per-layer gradient shaping. SPAMP tracks local gradient statistics, dynamically estimates thresholds, and applies power-based transformations to modulate update magnitudes in a differentiable manner. This perspective recasts clipping and warmup as dual mechanisms for controlling the effective update scale $η_t \|g_t\|$, offering a principled alternative to rigid heuristics. Extensive experiments across image and language tasks demonstrate that SPAMP improves stability, convergence, and robustness over existing methods.
arXiv.org Artificial Intelligence
Oct-3-2025
- Country:
- North America > United States (0.28)
- Asia > China (0.28)
- Genre:
- Research Report (0.64)
- Technology: