Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models