Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection

Jun-18-2024–arXiv.org Artificial Intelligence

These can range from logistic regression models to convolutional In this paper, we study the problem of detecting neural networks (Weller and Woo, 2019) or LSTM models machine-generated text when the large language model (Kudugunta and Ferrara, 2018). These binary classifiers (LLM) it is possibly derived from is unknown. We do so by can also act as base learners in ensemble methods (Fayaz et apply ensembling methods to the outputs from DetectGPT al., 2020). These features can also be augmented with classifiers (Mitchell et al. 2023), a zero-shot model for additional information such as account data in the context machine-generated text detection which is highly accurate of social media bot detection. However, high classification when the generative (or base) language model is the same accuracy for these methods are reliant on sufficiently-long as the discriminative (or scoring) language model. We find text length and a sufficiently-diverse corpus of training that simple summary statistics of DetectGPT sub-model machine-generated samples in terms of stylometric and outputs yield an AUROC of 0.73 (relative to 0.61) while linguistic characteristics in order to prevent overfitting. As retaining its zero-shot nature, and that supervised learning such, these classifiers need to be continually trained and methods sharply boost the accuracy to an AUROC of 0.94 updated, limiting their usefulness (Pegoraro et al., 2023).

accuracy, base model, detectgpt, (13 more...)

arXiv.org Artificial Intelligence

Jun-18-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County
    - Seattle (0.04)
  - Georgia > Fulton County
    - Atlanta (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)

Genre:
- Research Report > New Finding (0.35)

Industry:
- Information Technology > Security & Privacy (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found