ParaNames: A Massively Multilingual Entity Name Corpus

Jul-12-2022–arXiv.org Artificial Intelligence

We introduce ParaNames, a multilingual parallel name resource consisting of 118 million names spanning across 400 languages. Names are provided for 13.6 million entities which are mapped to standardized entity types (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to-date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English. Our resource is released under a Creative Commons license (CC BY 4.0) at https://github.com/bltlab/paranames.

experiment, information, transliteration, (14 more...)

arXiv.org Artificial Intelligence

Jul-12-2022

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Bahia (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Canada (0.04)
  - United States
    - New York (0.04)
    - North Carolina (0.04)
    - California (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Colorado > Denver County
      - Denver (0.04)
- Europe
  - United Kingdom (0.04)
  - Slovenia (0.04)
  - Bulgaria (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - China (0.04)
  - Middle East > Republic of Türkiye (0.04)
  - Kazakhstan (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)

Genre:
- Research Report
  - New Finding (0.46)
  - Experimental Study (0.46)

Industry:
- Government > Regional Government > North America Government > United States Government (0.67)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Text Processing (0.55)
  - Information Retrieval (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found