PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

Petukhova, Kseniia, Kazakov, Roman, Kochmar, Ekaterina

Apr-8-2024–arXiv.org Artificial Intelligence

In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection", focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 12th from 124 in the ranking for Subtask A (monolingual track), and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.

detection, linguistic feature, semeval-2024 task 8, (15 more...)

arXiv.org Artificial Intelligence

Apr-8-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Italy (0.04)
- North America
  - United States > Michigan
    - Washtenaw County > Ann Arbor (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Leisure & Entertainment > Games (0.48)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.89)
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found