Gradient Shaping Beyond Clipping: A Functional Perspective on Update Magnitude Control
–arXiv.org Artificial Intelligence
Gradient clipping is widely used to stabilize deep network training, but its formulation as a hard, fixed threshold limits flexibility and ignores gradient distribution dynamics. We propose SPAMP (Statistical Per-layer Adaptive Modulation and Projection), a unified framework that generalizes clipping into smooth, per-layer gradient shaping. SPAMP tracks local gradient statistics, dynamically estimates thresholds, and applies power-based transformations to modulate update magnitudes in a differentiable manner. This perspective recasts clipping and warmup as dual mechanisms for controlling the effective update scale $η_t \|g_t\|$, offering a principled alternative to rigid heuristics. Extensive experiments across image and language tasks demonstrate that SPAMP improves stability, convergence, and robustness over existing methods.
arXiv.org Artificial Intelligence
Oct-3-2025
- Country:
- Asia
- China
- Hebei Province > Shijiazhuang (0.04)
- Tianjin Province > Tianjin (0.04)
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.05)
- China
- North America > United States
- New York > New York County > New York City (0.04)
- Asia
- Genre:
- Research Report (0.64)
- Technology: