How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

Open in new window