CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning
Liu, Xiaoming, Zhang, Zhaohan, Wang, Yichen, Pu, Hang, Lan, Yu, Shen, Chao
–arXiv.org Artificial Intelligence
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequences as input and fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic structure of texts. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. To exploit the linguistic feature, we encode coherence information in form of graph into text representation. To tackle the challenges of low data resource, we employ a contrastive learning framework and propose an improved contrastive loss for preventing performance degradation brought by simple samples. The experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-art methods significantly. Also, we surprisingly find that MGTs originated from up-to-date language models could be easier to detect than these from previous models, in our experiments. And we propose some preliminary explanations for this counter-intuitive phenomena. All the codes and datasets are open-sourced.
arXiv.org Artificial Intelligence
Oct-20-2023
- Country:
- Africa > Middle East (0.04)
- Asia
- China > Shaanxi Province
- Xi'an (0.04)
- India (0.04)
- Middle East
- Bahrain (0.04)
- Oman (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- UAE
- Dubai Emirate > Dubai (0.04)
- Sharjah Emirate > Sharjah (0.04)
- Pakistan (0.04)
- Russia (0.04)
- China > Shaanxi Province
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Switzerland (0.04)
- United Kingdom
- England > Greater London
- London (0.04)
- Scotland (0.04)
- England > Greater London
- Ireland > Leinster
- North America > United States
- California > Los Angeles County
- Los Angeles (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > Los Angeles County
- Oceania > New Zealand (0.04)
- South America > Argentina (0.04)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Government (1.00)
- Information Technology (0.67)
- Leisure & Entertainment > Sports
- Soccer (1.00)
- Media (0.93)
- Technology: