InfoSync: Information Synchronization across Multilingual Semi-structured Tables
Khincha, Siddharth, Jain, Chelsi, Gupta, Vivek, Kataria, Tushar, Zhang, Shuo
–arXiv.org Artificial Intelligence
Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.
arXiv.org Artificial Intelligence
Jul-6-2023
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Utah (0.04)
- Colorado (0.04)
- Washington > King County
- Seattle (0.14)
- New York > New York County
- New York City (0.05)
- Canada > British Columbia
- Europe
- Latvia > Riga Municipality
- Riga (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Latvia > Riga Municipality
- Asia
- North America
- Genre:
- Research Report (0.63)
- Industry:
- Education (0.46)
- Technology: