On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective Zeke Xie
–Neural Information Processing Systems
Neural Information Processing Systems
Oct-8-2025, 00:31:41 GMT
–Neural Information Processing Systems
Neural Information Processing Systems
Oct-8-2025, 00:31:41 GMT