Parallel Structures in Pre-training Data Yield In-Context Learning

Open in new window