Gradient Shaping Beyond Clipping: A Functional Perspective on Update Magnitude Control

Oct-3-2025–arXiv.org Artificial Intelligence

Gradient clipping is widely used to stabilize deep network training, but its formulation as a hard, fixed threshold limits flexibility and ignores gradient distribution dynamics. We propose SPAMP (Statistical Per-layer Adaptive Modulation and Projection), a unified framework that generalizes clipping into smooth, per-layer gradient shaping. SPAMP tracks local gradient statistics, dynamically estimates thresholds, and applies power-based transformations to modulate update magnitudes in a differentiable manner. This perspective recasts clipping and warmup as dual mechanisms for controlling the effective update scale $η_t \|g_t\|$, offering a principled alternative to rigid heuristics. Extensive experiments across image and language tasks demonstrate that SPAMP improves stability, convergence, and robustness over existing methods.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Oct-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Asia > China (0.28)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.47)
  - Natural Language > Large Language Model (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found