Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams
Pokou, Yao Jean Marc (Université de Moncton) | Fournier-Viger, Philippe (Université de Moncton) | Moghrabi, Chadia (Université de Moncton)
Computer-supported authorship attribution provides tools for extracting stylistic features that can help verify or identify the author of text documents. In many situations finding the author of a document is very important, such as the detection of plagiarism for protecting copyrights and forensic support during criminal investigations. This paper, thus explores a novel stylistic feature with the aim of accurately characterizing an author's work. In particular, the use of part-of-speech skip-grams and an in-house top-k sequential pattern mining algorithm are considered for the task of authorship attribution. A study using a collection of of 30 texts, written by 10 authors, consisting of 2,615,856 words and 99,903 sentences, confirms that mining part-of-speech skip-grams in texts facilitates authorship inference.
May-8-2016
- Technology: