LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts

Gameiro, Henrique Da Silva, Kucharavy, Andrei, Dolamic, Ljiljana

Sep-5-2024–arXiv.org Artificial Intelligence

With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated attackers. We demonstrate that existing LLM detectors, whether zero-shot or purpose-trained, are not ready for real-world use in that setting. All tested zero-shot detectors perform inconsistently with prior benchmarks and are highly vulnerable to sampling temperature increase, a trivial attack absent from recent benchmarks. A purpose-trained detector generalizing across LLMs and unseen attacks can be developed, but it fails to generalize to new human-written texts. We argue that the former indicates domain-specific benchmarking is needed, while the latter suggests a trade-off between the adversarial evasion resilience and overfitting to the reference human text, with both needing evaluation in benchmarks and currently absent. We believe this suggests a re-consideration of current LLM detector benchmarking approaches and provides a dynamically extensible benchmark to allow it (https://github.com/Reliable-Information-Lab-HEVS/dynamic_llm_detector_benchmark).

benchmark, dataset, detector, (14 more...)

arXiv.org Artificial Intelligence

Sep-5-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > Ventura County
      - Simi Valley (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Ukraine (0.04)
  - Switzerland > Vaud
    - Lausanne (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Middle East > Malta
    - Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Asia
  - Singapore (0.04)
  - Middle East
    - Iraq (0.14)
    - Palestine > Gaza Strip
      - Khan Yunis Governorate > Khan Yunis (0.04)
      - Gaza Governorate > Gaza (0.04)

Genre:
- Research Report (0.82)

Industry:
- Media > News (1.00)
- Information Technology (1.00)
- Government
  - Military (1.00)
  - Regional Government > North America Government
    - United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Performance Analysis > Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found