Gradient Localization Improves Lifelong Pretraining of Language Models

Open in new window