Goto

Collaborating Authors

 automatic authorship attribution


Automatic Authorship Attribution in the Work of Tirso de Molina

Cavadas, Miguel, Gamallo, Pablo

arXiv.org Artificial Intelligence

Automatic Authorship Attribution (AAA) is the result of applying tools and techniques from Digital Humanities to authorship attribution studies. Through a quantitative and statistical approach this discipline can draw further conclusions about renowned authorship issues which traditional critics have been dealing with for centuries, opening a new door to style comparison. The aim of this paper is to prove the potential of these tools and techniques by testing the authorship of five comedies traditionally attributed to Spanish playwright Tirso de Molina (1579-1648): La ninfa del cielo, El burlador de Sevilla, Tan largo me lo fiais, La mujer por fuerza and El condenado por desconfiado. To accomplish this purpose some experiments concerning clustering analysis by Stylo package from R and four distance measures are carried out on a corpus built with plays by Tirso, Andres de Claramonte (c. 1560-1626), Antonio Mira de Amescua (1577-1644) and Luis Velez de Guevara (1579-1644). The results obtained point to the denial of all the attributions to Tirso except for the case of La mujer por fuerza.


Automatic Authorship Attribution of Noisy Documents

Sayoud, Halim (University of Sciences and Technology Houari Boumediene (USTHB)) | Khennouf, Salah (University of Sciences and Technology Houari Boumediene (USTHB)) | Benzerroug, Hocine ( Independent Researcher ) | Hamadache, Zohra (University of Sciences and Technology Houari Boumediene (USTHB)) | Hadjadj, Hassina (University of Sciences and Technology Houari Boumediene (USTHB)) | Ouamour, Siham (University of Sciences and Technology Houari Boumediene (USTHB))

AAAI Conferences

In this survey, we conduct an investigation on the robustness of several features and classifiers in automatic authorship attribution. Our corpus consists in 25 different documents written by 5 different American philosophers in English. The different documents pass throw a digital conversion into grey-scaled images and several levels of noise are added to corrupt those image documents. The noise consists in a “Salt & Pepper” type, which is randomly added on the surface of the images with the following noise levels: 0%, 1%, 2%, 3%, 4%, 5%, 6% and 7%. Thus, each image goes throw an OCR program (Optical Character Recognition) to extract the text from the image. Then, the obtained text document is kept to be used during the experiments of authorship attribution. Several features and classifiers are employed and evaluated with regards to the classification performances. Results are quite interesting and show that the most robust feature in au-thorship attribution is the character-tetragram, which provides a score of 100% even at a noise level of 7%.