PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models
Michail, Andrianos, Clematide, Simon, Opitz, Juri
–arXiv.org Artificial Intelligence
The task of determining whether two texts are paraphrases has long been a challenge in NLP. However, the prevailing notion of paraphrase is often quite simplistic, offering only a limited view of the vast spectrum of paraphrase phenomena. Indeed, we find that evaluating models in a paraphrase dataset can leave uncertainty about their true semantic understanding. To alleviate this, we release paraphrasus, a benchmark designed for multi-dimensional assessment of paraphrase detection models and finer model selection. We find that paraphrase detection models under a fine-grained evaluation lens exhibit trade-offs that cannot be captured through a single classification dataset.
arXiv.org Artificial Intelligence
Sep-18-2024
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Czechia > Prague (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium (0.04)
- Slovenia (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Denmark (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- North America
- Canada (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Georgia > Fulton County
- Atlanta (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Georgia > Fulton County
- Oceania > Australia
- South America > Chile
- Asia
- Genre:
- Research Report (0.82)
- Technology: