How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse

Open in new window