Weight decay induces low-rank attention layers

Neural Information Processing Systems 

The effect of regularizers such as weight decay when training deep neural networks is not well understood.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found