Document Provenance and Authentication through Authorship Classification

Zamir, Muhammad Tayyab, Ayub, Muhammad Asif, Khan, Jebran, Ikram, Muhammad Jawad, Ahmad, Nasir, Ahmad, Kashif

Mar-2-2023–arXiv.org Artificial Intelligence

Style analysis, which is relatively a less explored topic, enables several interesting applications. For instance, it allows authors to adjust their writing style to produce a more coherent document in collaboration. Similarly, style analysis can also be used for document provenance and authentication as a primary step. In this paper, we propose an ensemble-based text-processing framework for the classification of single and multi-authored documents, which is one of the key tasks in style analysis. The proposed framework incorporates several state-of-the-art text classification algorithms including classical Machine Learning (ML) algorithms, transformers, and deep learning algorithms both individually and in merit-based late fusion. For the merit-based late fusion, we employed several weight optimization and selection methods to assign merit-based weights to the individual text classification algorithms. We also analyze the impact of the characters on the task that are usually excluded in NLP applications during pre-processing by conducting experiments on both clean and un-clean data. The proposed framework is evaluated on a large-scale benchmark dataset, significantly improving performance over the existing solutions.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Mar-2-2023

arXiv.org PDF

Add feedback

Country:
- Europe
  - France (0.04)
  - Ireland > Munster
    - County Cork > Cork (0.04)
- Asia
  - South Korea (0.04)
  - Pakistan
    - Islamabad Capital Territory > Islamabad (0.04)
    - Khyber Pakhtunkhwa > Peshawar Division
      - Peshawar District > Peshawar (0.04)
  - Middle East > Saudi Arabia
    - Mecca Province > Jeddah (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (0.61)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.86)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found