Linguistically Conditioned Semantic Textual Similarity
Tu, Jingxuan, Xu, Keer, Yue, Liulu, Ye, Bingyang, Rim, Kyeongmin, Pustejovsky, James
–arXiv.org Artificial Intelligence
Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.
arXiv.org Artificial Intelligence
Jun-5-2024
- Country:
- Asia
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Japan > Kyūshū & Okinawa
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Croatia (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Quebec
- Montreal (0.04)
- Dominican Republic (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Colorado > Denver County
- Denver (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts > Middlesex County
- Waltham (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Seattle (0.04)
- California > San Diego County
- Canada > Quebec
- Asia
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment > Sports > Tennis (0.47)
- Technology: