Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training
Ngo, Nghia Trung, Nguyen, Thien Huu
–arXiv.org Artificial Intelligence
The majority of previous researches addressing multi-lingual IE are limited to zero-shot cross-lingual single-transfer (one-to-one) setting, with high-resource languages predominantly as source training data. As a result, these works provide little understanding and benefit for the realistic goal of developing a multi-lingual IE system that can generalize to as many languages as possible. Our study aims to fill this gap by providing a detailed analysis on Cross-Lingual Multi-Transferability (many-to-many transfer learning), for the recent IE corpora that cover a diverse set of languages. Specifically, we first determine the correlation between single-transfer performance and a wide range of linguistic-based distances. From the obtained insights, a combined language distance metric can be developed that is not only highly correlated but also robust across different tasks and model scales. Next, we investigate the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes. Language clustering based on the newly defined distance can provide directions for achieving the optimal cost-performance trade-off in data (languages) selection problem. Finally, a relational-transfer setting is proposed to further incorporate multi-lingual unlabeled data based on adversarial training using the relation induced from the above linguistic distance.
arXiv.org Artificial Intelligence
Nov-13-2024
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Texas > Dallas County
- Dallas (0.04)
- Oregon > Lane County
- Eugene (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Europe
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany > Saxony
- Leipzig (0.04)
- Spain > Valencian Community
- Asia
- Indonesia > Bali (0.04)
- China (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Genre:
- Research Report (0.82)
- Technology: