Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

Wu, Junchao, Zhan, Runzhe, Wong, Derek F., Yang, Shu, Liu, Xuebo, Chao, Lidia S., Zhang, Min

May-7-2024–arXiv.org Artificial Intelligence

The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. This approach entails computing the Grammar Error Correction Score (GECScore) for the given text to distinguish between human-written and LLM-generated text. Extensive experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.7% and showing strong robustness against paraphrase and adversarial perturbation attacks.

computational linguistic, detection, llm-generated text, (15 more...)

arXiv.org Artificial Intelligence

May-7-2024

arXiv.org PDF

Add feedback

Country:
- South America > Brazil (0.04)
- Oceania > Australia (0.04)
- North America
  - United States
    - Texas (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Greater Manchester > Salford (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Greece > West Greece
    - Patra (0.04)
  - France > Occitanie
    - Haute-Garonne > Toulouse (0.04)
  - Bulgaria > Sofia City Province
    - Sofia (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - Macao (0.04)
  - China
    - Heilongjiang Province > Harbin (0.04)
    - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Performance Analysis > Accuracy (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found