Checkpoint Merging via Bayesian Optimization in LLM Pretraining