A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms
Ghanem, Sana, Jarrar, Mustafa, Jarrar, Radi, Bounhas, Ibrahim
–arXiv.org Artificial Intelligence
This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at https://portal.sina.birzeit.edu/synonyms.
arXiv.org Artificial Intelligence
Feb-4-2023
- Country:
- Europe
- United Kingdom > England
- Greater London > London (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Île-de-France > Paris
- United Kingdom > England
- Asia > Middle East
- UAE (0.04)
- Palestine (0.04)
- Jordan > Amman Governorate
- Amman (0.04)
- Africa
- Middle East > Tunisia (0.04)
- Sudan (0.04)
- Europe
- Genre:
- Research Report (0.82)
- Technology: