Gradient Localization Improves Lifelong Pretraining of Language Models