Checkpoint Merging via Bayesian Optimization in LLM Pretraining

Open in new window