InfoSync: Information Synchronization across Multilingual Semi-structured Tables
Khincha, Siddharth, Jain, Chelsi, Gupta, Vivek, Kataria, Tushar, Zhang, Shuo
–arXiv.org Artificial Intelligence
Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.
arXiv.org Artificial Intelligence
Jul-6-2023
- Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States
- Washington > King County > Seattle (0.14)
- Genre:
- Research Report (0.63)
- Industry:
- Education (0.46)
- Technology: