Weight decay induces low-rank attention layers

Open in new window