Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Nguyen, Hoang H., Zhang, Chenwei, Zhang, Tao, Rohrbaugh, Eugene, Yu, Philip S.
–arXiv.org Artificial Intelligence
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.
arXiv.org Artificial Intelligence
Jul-10-2023
- Country:
- North America > United States
- Washington > King County
- Seattle (0.04)
- Texas > Travis County
- Austin (0.04)
- Pennsylvania > Dauphin County
- Harrisburg (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Washington > King County
- Europe > Ireland
- Leinster > County Dublin > Dublin (0.04)
- Asia
- Vietnam (0.14)
- Middle East > Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: