On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective Zeke Xie

Neural Information Processing Systems 

In deep learning, there exist two types of "weight decay": L