Ambiguity Aware Arabic Document Indexing and Query Expansion: A Morphological Knowledge Learning-Based Approach
Soudani, Nadia (La Manouba University) | Bounhas, Ibrahim (La Manouba University) | Babis, Sawssen Ben (La Manouba University)
In this paper, we propose a morphology-based Arabic Information Retrieval (IR) system. Arabic is an inflectional and derivational language and Arabic texts are highly ambiguous at the morphological level. However, short diacritics have a central role in understanding Arabic texts. That is, we propose to build a morphological knowledge base from huge vocalized corpora to reduce the ambiguity of Arabic documents. This base may be used both for the morphological indexing of queries and documents and to the morphological enrichment of queries. Indeed, it stores (i) the morpho-syntactic attributes of Arabic words; and, (ii) the morphological relations between Arabic tokens. It also represents the Arabic lexicon at several levels (e.g. stems, lemmas and words). We focus on morphological analysis and disambiguation and its impact in information retrieval. We perform experiments, which try to study the problem of indexing units and morphology-based query expansion in Arabic IR.
May-17-2018
- Technology: