On Initializing Transformers with Pre-trained Embeddings

Open in new window