TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages

Apr-19-2024–arXiv.org Artificial Intelligence

We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, character- and word-level gap-filling. We developed a simple, uniform, and computationally lightweight approach based on the adapters framework using parameter-efficient fine-tuning. We applied the same adapter-based approach uniformly to all tasks and 16 languages by fine-tuning stacked language- and task-specific adapters. Our submission obtained an overall second place out of three submissions, with the first place in word-level gap-filling. Our results show the feasibility of adapting language models pre-trained on modern languages to historical and ancient languages via adapter training.

adapter, computational linguistic, tokenizer, (15 more...)

arXiv.org Artificial Intelligence

Apr-19-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.14)
- Europe
  - Spain > Aragón (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Germany
    - Berlin (0.04)
    - Bavaria > Lower Franconia
      - Würzburg (0.04)
  - Estonia > Tartu County
    - Tartu (0.05)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China (0.04)
  - Singapore (0.04)
  - Japan > Honshū
    - Tōhoku > Iwate Prefecture > Morioka (0.04)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found