Training Language Models with Memory Augmentation