Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

Open in new window