MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness
Zhou, Shijia, Shan, Huangyan, Plank, Barbara, Litschko, Robert
–arXiv.org Artificial Intelligence
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained languages models: XLM-R and Furina. We experiment with 1) single-source transfer and select source languages based on typological similarity, 2) augmenting English training data with the two nearest-neighbor source languages, and 3) multi-source transfer where we compare selecting on all training languages against languages from the same family. We further study machine translation-based data augmentation and the impact of script differences. Our submission achieved the first place in the C8 (Kinyarwanda) test set.
arXiv.org Artificial Intelligence
Apr-3-2024
- Country:
- Asia > Singapore (0.04)
- Europe
- Croatia (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Estonia > Tartu County
- Tartu (0.04)
- Faroe Islands > Streymoy
- Tórshavn (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Cyprus
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Canada > Ontario
- Oceania > Australia
- Genre:
- Research Report (0.64)
- Technology: