DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings
Abdul-Mageed, Muhammad, Elbassuoni, Shady, Doughman, Jad, Elmadany, AbdelRahim, Nagoudi, El Moatez Billah, Zoughby, Yorgo, Shaher, Ahmad, Gaba, Iskander, Helal, Ahmed, El-Razzaz, Mohammed
–arXiv.org Artificial Intelligence
Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.
arXiv.org Artificial Intelligence
Nov-22-2020
- Country:
- Africa > Middle East
- Algeria (0.04)
- Egypt
- Cairo Governorate > Cairo (0.04)
- Eastern Desert > Central Province (0.04)
- Tunisia (0.04)
- Asia > Middle East
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America > Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Africa > Middle East
- Genre:
- Research Report (0.82)
- Technology: