On the N-gram Approximation of Pre-trained Language Models