Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection

Ong, Ivan, Quek, Boon King

arXiv.org Artificial Intelligence 

These can range from logistic regression models to convolutional In this paper, we study the problem of detecting neural networks (Weller and Woo, 2019) or LSTM models machine-generated text when the large language model (Kudugunta and Ferrara, 2018). These binary classifiers (LLM) it is possibly derived from is unknown. We do so by can also act as base learners in ensemble methods (Fayaz et apply ensembling methods to the outputs from DetectGPT al., 2020). These features can also be augmented with classifiers (Mitchell et al. 2023), a zero-shot model for additional information such as account data in the context machine-generated text detection which is highly accurate of social media bot detection. However, high classification when the generative (or base) language model is the same accuracy for these methods are reliant on sufficiently-long as the discriminative (or scoring) language model. We find text length and a sufficiently-diverse corpus of training that simple summary statistics of DetectGPT sub-model machine-generated samples in terms of stylometric and outputs yield an AUROC of 0.73 (relative to 0.61) while linguistic characteristics in order to prevent overfitting. As retaining its zero-shot nature, and that supervised learning such, these classifiers need to be continually trained and methods sharply boost the accuracy to an AUROC of 0.94 updated, limiting their usefulness (Pegoraro et al., 2023).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found