Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

Li, Yafu, Wang, Zhilin, Cui, Leyang, Bi, Wei, Shi, Shuming, Zhang, Yue

May-29-2024–arXiv.org Artificial Intelligence

AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.

detection, span, text span, (11 more...)

arXiv.org Artificial Intelligence

May-29-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Russia (0.14)
  - Norway (0.04)
  - Germany (0.04)
  - Ireland (0.04)
  - France (0.04)
  - United Kingdom
    - Scotland (0.04)
    - England > Greater Manchester
      - Salford (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia
  - Russia (0.14)
  - Singapore (0.04)
  - Pakistan (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Syria (0.04)
    - Iraq (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Law (0.68)
- Government (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
- Consumer Products & Services
  - Restaurants (0.46)
  - Food, Beverage, Tobacco & Cannabis (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.89)
  - Machine Learning > Performance Analysis
    - Accuracy (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found