Score Fusion Based Authorship Attribution of Ancient Arabic Texts
Sayoud, Halim (University of Sciences and Technology Houari Boumediene (USTHB)) | Ouamour, Siham (University of Sciences and Technology Houari Boumediene (USTHB))
In this paper, we investigate the authorship of several short historical texts that are written by ten ancient Arabic travelers: this Arabic dataset, which was collected by the authors in 2011, and called AAAT (Authorship attribution of Ancient Arabic Texts) corpus, is considered as a reference dataset in Arabic. Several experiments of authorship attribution are conducted by using different features namely: characters, character n-grams, and lexical features such as words, word n-grams, and rare words. On the other hand, different classifiers are employed, such as: statistical distances, Multi Layer Percep-tron (MLP), Support Vector Machines (SVM) and Linear Regression (LR). In this investigation, a new fusion technique is proposed to enhance the overall performances of the classifiers: it is called Score Based Fusion (SBF). Results show good attribution performances with an optimal score between 80% and 90% of good authorship attribution. The proposed fusion technique raised this score to 100% of good authorship attribution. Moreover, this comparative survey has revealed interesting results concerning the Arabic language and more particularly with short texts.
May-16-2017
- Technology: