MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

Zhou, Shijia, Shan, Huangyan, Plank, Barbara, Litschko, Robert

Apr-3-2024–arXiv.org Artificial Intelligence

This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained languages models: XLM-R and Furina. We experiment with 1) single-source transfer and select source languages based on typological similarity, 2) augmenting English training data with the two nearest-neighbor source languages, and 3) multi-source transfer where we compare selecting on all training languages against languages from the same family. We further study machine translation-based data augmentation and the impact of script differences. Our submission achieved the first place in the C8 (Kinyarwanda) test set.

computational linguistic, source language, target language, (14 more...)

arXiv.org Artificial Intelligence

Apr-3-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Croatia (0.04)
  - Spain > Valencian Community
    - Valencia Province > Valencia (0.04)
  - Middle East > Cyprus
    - Nicosia > Nicosia (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Faroe Islands > Streymoy
    - Tórshavn (0.04)
  - Estonia > Tartu County
    - Tartu (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.67)
  - Natural Language
    - Text Processing (0.67)
    - Machine Translation (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found