Weight decay induces low-rank attention layers