SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

Goworek, Roksana, Karlcut, Harpal, Shezad, Muhammad, Darshana, Nijaguna, Mane, Abhishek, Bondada, Syam, Sikka, Raghav, Mammadov, Ulvi, Allahverdiyev, Rauf, Purighella, Sriram, Gupta, Paridhi, Ndegwa, Muhinyia, Dubossarsky, Haim

Jul-23-2025–arXiv.org Artificial Intelligence

This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense-annotated datasets of sentences containing polysemous words, spanning ten low-resource languages across diverse language families and scripts. To facilitate dataset creation, the paper presents a demonstrably beneficial semi-automatic annotation method. The utility of the datasets is demonstrated through Word-in-Context (WiC) formatted experiments that evaluate transfer on these low-resource languages. Results highlight the importance of targeted dataset creation and evaluation for effective polysemy disambiguation in low-resource settings and transfer studies. The released datasets and code aim to support further research into fair, robust, and truly multilingual NLP.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

Jul-23-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Africa (0.93)
- North America > United States
  - Minnesota (0.28)
- Asia > Japan
  - Honshū (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (0.70)
    - Text Processing (0.46)
    - Generation (0.46)
  - Machine Learning
    - Statistical Learning (0.68)
    - Neural Networks > Deep Learning (0.67)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found