Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks

Open in new window