Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks Tianyu He a, Aritra Das

Neural Information Processing Systems 

Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks.