Instability in Downstream Task Performance During LLM Pretraining

Open in new window