Investigating the Impact of Semi-Supervised Methods with Data Augmentation on Offensive Language Detection in Romanian Language

Nicola, Elena-Beatrice, Cercel, Dumitru-Clementin, Pop, Florin

Jul-29-2024–arXiv.org Artificial Intelligence

Offensive language detection is a crucial task in today's digital landscape, where online platforms grapple with maintaining a respectful and inclusive environment. However, building robust offensive language detection models requires large amounts of labeled data, which can be expensive and time-consuming to obtain. Semi-supervised learning offers a feasible solution by utilizing labeled and unlabeled data to create more accurate and robust models. In this paper, we explore a few different semi-supervised methods, as well as data augmentation techniques. Concretely, we implemented eight semi-supervised methods and ran experiments for them using only the available data in the RO-Offense dataset and applying five augmentation techniques before feeding the data to the models. Experimental results demonstrate that some of them benefit more from augmentations than others.

augmentation, offensive language, semi-supervised method, (13 more...)

arXiv.org Artificial Intelligence

Jul-29-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Romania > București - Ilfov Development Region
    - Municipality of Bucharest > Bucharest (0.05)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Unsupervised or Indirectly Supervised Learning (0.71)
  - Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found