Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents
Vamvas, Jannis, Sennrich, Rico
–arXiv.org Artificial Intelligence
Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications. We formulate recognizing semantic differences (RSD) as a token-level regression task and study three unsupervised approaches that rely on a masked language model. To assess the approaches, we begin with basic English sentences and gradually move to more complex, cross-lingual document pairs. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels. However, all unsupervised approaches still leave a large margin of improvement. Code to reproduce our experiments is available at https://github.com/ZurichNLP/recognizing-semantic-differences
arXiv.org Artificial Intelligence
Oct-20-2023
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- South Korea (0.04)
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- Croatia > Dubrovnik-Neretva County
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York (0.04)
- California > San Diego County
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Technology: