CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning
Liu, Xiaoming, Zhang, Zhaohan, Wang, Yichen, Pu, Hang, Lan, Yu, Shen, Chao
–arXiv.org Artificial Intelligence
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequences as input and fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic structure of texts. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. To exploit the linguistic feature, we encode coherence information in form of graph into text representation. To tackle the challenges of low data resource, we employ a contrastive learning framework and propose an improved contrastive loss for preventing performance degradation brought by simple samples. The experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-art methods significantly. Also, we surprisingly find that MGTs originated from up-to-date language models could be easier to detect than these from previous models, in our experiments. And we propose some preliminary explanations for this counter-intuitive phenomena. All the codes and datasets are open-sourced.
arXiv.org Artificial Intelligence
Oct-20-2023
- Country:
- Asia > Middle East
- UAE (0.28)
- Europe (1.00)
- North America > United States (0.92)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Government (1.00)
- Information Technology (0.67)
- Leisure & Entertainment > Sports
- Soccer (1.00)
- Media (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Information Retrieval (0.68)
- Large Language Model (1.00)
- Text Processing (0.93)
- Machine Learning
- Information Technology > Artificial Intelligence