Decoupled Weight Decay for Any $p$ Norm
Outmezguine, Nadav Joseph, Levi, Noam
–arXiv.org Artificial Intelligence
With the success of deep neural networks (NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements. Sparsification has consequently emerged as a leading approach to tackle these issues. In this work, we consider a simple yet effective approach to sparsification, based on the Bridge, or $L_p$ regularization during training. We introduce a novel weight decay scheme, which generalizes the standard $L_2$ weight decay to any $p$ norm. We show that this scheme is compatible with adaptive optimizers, and avoids the gradient divergence associated with $0
arXiv.org Artificial Intelligence
Apr-22-2024
- Country:
- North America
- United States
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Indiana > Marion County
- Lawrence (0.04)
- California > Alameda County
- Berkeley (0.04)
- Pennsylvania > Allegheny County
- Canada > Ontario
- Toronto (0.14)
- United States
- Europe
- Monaco (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- Asia
- China (0.04)
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- North America
- Genre:
- Research Report > New Finding (0.93)
- Technology: