Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

Jul-11-2023–arXiv.org Artificial Intelligence

We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight Figure 1: Top: Comparison of state-of-the-art models generalization failures to specific typological on M2C tests in a selected set of languages. Models characteristics such as temporal expressions perform well on English but poorly on certain tests in in Swahili and compounding possessives other languages. Bottom: Even the largest models fail in Finish. Our findings motivate the development on tests probing language-specific features, e.g., the distinction of models that address these blind spots.

artificial intelligence, computational linguistic, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-11-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
- Europe
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Germany > Saxony
    - Leipzig (0.04)
  - Belgium > Flanders
    - Antwerp Province > Antwerp (0.04)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found