Pre-training Limited Memory Language Models with Internal and External Knowledge