XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech