CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Open in new window