Evaluating Pixel Language Models on Non-Standardized Languages

Muñoz-Ortiz, Alberto, Blaschke, Verena, Plank, Barbara

Dec-12-2024–arXiv.org Artificial Intelligence

We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts.

artificial intelligence, dialect, natural language, (15 more...)

arXiv.org Artificial Intelligence

Dec-12-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America
  - Dominican Republic (0.04)
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.14)
- Europe
  - Spain (0.04)
  - Netherlands (0.04)
  - Switzerland
    - Basel-City > Basel (0.04)
    - Zürich > Zürich (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - France
    - Île-de-France > Paris
      - Paris (0.04)
    - Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
      - Marseille (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.05)
  - Bulgaria > Sofia City Province
    - Sofia (0.04)

Genre:
- Research Report > New Finding (0.69)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found