Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings
García-Ferrero, Iker, Agerri, Rodrigo, Rigau, German
–arXiv.org Artificial Intelligence
Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence labelling, in this paper we experimentally demonstrate that high capacity multilingual language models applied in a zero-shot (model-based cross-lingual transfer) setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. More specifically, machine translation often generates a textual signal which is different to what the models are exposed to when using gold standard data, which affects both the fine-tuning and evaluation processes. Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.
arXiv.org Artificial Intelligence
Apr-27-2023
- Country:
- Oceania > Australia
- North America
- Canada (0.04)
- United States
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- California > San Diego County
- San Diego (0.04)
- New Mexico > Santa Fe County
- Europe
- Asia
- Middle East > Israel (0.04)
- Indonesia > Bali (0.04)
- China > Hong Kong (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Africa > Middle East
- Egypt > Giza Governorate > Giza (0.05)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (0.68)
- Technology: