Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging
Imani, Ayyoob, Severini, Silvia, Sabet, Masoud Jalili, Yvon, François, Schütze, Hinrich
–arXiv.org Artificial Intelligence
Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.
arXiv.org Artificial Intelligence
Oct-31-2022
- Country:
- Oceania > Australia
- North America > United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- Minnesota > Hennepin County
- Europe
- Spain (0.04)
- Czechia > Prague (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- Berlin (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Indonesia > Bali (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- China
- Genre:
- Research Report (1.00)
- Technology: