Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models

Open in new window