One-Layer Transformer Provably Learns One-Nearest Neighbor In Context Cheng Gao

Open in new window