Multilingual Event Linking to Wikidata
Pratapa, Adithya, Gupta, Rishubh, Mitamura, Teruko
–arXiv.org Artificial Intelligence
We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.
arXiv.org Artificial Intelligence
Jul-16-2022
- Country:
- Africa
- Middle East
- Egypt (0.04)
- Morocco > Casablanca-Settat Region
- Casablanca (0.04)
- Tunisia > Tunis Governorate
- Tunis (0.04)
- North Africa (0.04)
- South Africa (0.04)
- Middle East
- Asia
- Japan > Honshū
- Tōhoku (0.04)
- Indonesia > Bali (0.04)
- Middle East > Israel (0.04)
- Russia > Far Eastern Federal District
- Primorsky Krai > Vladivostok (0.04)
- Kazakhstan > Almaty Region
- Almaty (0.04)
- Philippines (0.04)
- China
- Beijing > Beijing (0.04)
- Shandong Province > Qingdao (0.04)
- South Korea
- Gangwon-do > Pyeongchang (0.04)
- Ulsan > Ulsan (0.04)
- India (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Japan > Honshū
- Europe
- Netherlands > North Brabant
- Eindhoven (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- United Kingdom (0.14)
- Italy
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Tuscany > Florence (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- France (0.14)
- Slovenia (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Germany (0.04)
- Hungary > Budapest
- Budapest (0.05)
- Austria
- Netherlands > North Brabant
- North America
- Canada > British Columbia
- Dominican Republic (0.04)
- United States
- California
- Los Angeles County > Los Angeles (0.14)
- Santa Clara County > San Jose (0.04)
- Illinois (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- New York City (0.04)
- Oregon (0.04)
- Pennsylvania
- Allegheny County > Pittsburgh (0.04)
- Philadelphia County > Philadelphia (0.14)
- California
- Oceania
- Australia
- New South Wales > Sydney (0.04)
- Tasmania > Hobart (0.04)
- New Zealand > South Island
- Canterbury Region > Christchurch (0.04)
- Australia
- Pacific Ocean > North Pacific Ocean
- East China Sea > Yellow Sea (0.04)
- Africa
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Government
- Leisure & Entertainment > Sports
- Olympic Games (1.00)
- Media > Film (1.00)
- Technology: