Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Mayhew, Stephen, Blevins, Terra, Liu, Shuheng, Šuppa, Marek, Gonen, Hila, Imperial, Joseph Marvin, Karlsson, Börje F., Lin, Peiqin, Ljubešić, Nikola, Miranda, LJ, Plank, Barbara, Riabi, Arij, Pinter, Yuval

Nov-15-2023–arXiv.org Artificial Intelligence

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.

annotation, computational linguistic, dataset, (11 more...)

arXiv.org Artificial Intelligence

Nov-15-2023

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - North Carolina (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Slovenia (0.14)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
  - Slovakia > Bratislava
    - Bratislava (0.04)
  - Italy > Tuscany
    - Pisa Province > Pisa (0.04)
    - Florence (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.05)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Bulgaria > Sofia City Province
    - Sofia (0.04)
  - Finland > Southwest Finland
    - Turku (0.04)
  - Sweden > Östergötland County
    - Linköping (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
  - Faroe Islands > Streymoy
    - Tórshavn (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Estonia > Tartu County
    - Tartu (0.04)
- Asia
  - South Korea (0.14)
  - Middle East
    - UAE (0.04)
    - Israel (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Government (0.67)
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)