On Initializing Transformers with Pre-trained Embeddings