Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

Rice, Enora, Marashian, Ali, Haynie, Hannah, von der Wense, Katharina, Palmer, Alexis

Mar-25-2025–arXiv.org Artificial Intelligence

Cross-lingual transfer learning is an invaluable tool for overcoming data scarcity, yet selecting a suitable transfer language remains a challenge. The precise roles of linguistic typology, training data, and model architecture in transfer language choice are not fully understood. We take a holistic approach, examining how both dataset-specific and fine-grained typological features influence transfer language selection for part-of-speech tagging, considering two different sources for morphosyntactic features. While previous work examines these dynamics in the context of bilingual biLSTMS, we extend our analysis to a more modern transfer learning pipeline: zero-shot prediction with pretrained multilingual models. We train a series of transfer language ranking systems and examine how different feature inputs influence ranker performance across architectures. Word overlap, type-token ratio, and genealogical distance emerge as top features across all architectures. Our findings reveal that a combination of typological and dataset-dependent features leads to the best rankings, and that good performance can be obtained with either feature group on its own.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-25-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Colorado > Boulder County
    - Boulder (0.04)
- Europe
  - Spain (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Germany > Rheinland-Pfalz
    - Mainz (0.04)
- Asia
  - India (0.04)
  - China > Hong Kong (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:
- Research Report > New Finding (0.49)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning > Object-Oriented Architecture (0.60)
  - Machine Learning > Statistical Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found