Weight decay induces low-rank attention layers
–Neural Information Processing Systems
The effect of regularizers such as weight decay when training deep neural networks is not well understood.
Neural Information Processing Systems
Oct-9-2025, 17:53:13 GMT
- Country:
- North America > United States
- Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Switzerland
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report
- New Finding (0.93)
- Experimental Study (0.93)
- Research Report
- Technology: