Data Distributional Properties Drive Emergent In-Context Learning in Transformers

Open in new window