Attention layers provably solve single-location regression

Open in new window