Robust AI-Generated Text Detection by Restricted Embeddings
Kuznetsov, Kristian, Tulchinskii, Eduard, Kushnareva, Laida, Magai, German, Barannikov, Serguei, Nikolenko, Sergey, Piontkovskaya, Irina
–arXiv.org Artificial Intelligence
Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD
arXiv.org Artificial Intelligence
Oct-10-2024
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States > Minnesota
- Hennepin County > Minneapolis (0.14)
- Europe
- France (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Russia (0.14)
- Japan (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Genre:
- Research Report > Promising Solution (0.34)
- Technology: