Combating high variance in Data-Scarce Implicit Hate Speech Classification

Pal, Debaditya, Chaudhari, Kaustubh, Sharma, Harsh

Aug-29-2022–arXiv.org Artificial Intelligence

Hate speech classification has been a long-standing problem in natural language processing. However, even though there are numerous hate speech detection methods, they usually overlook a lot of hateful statements due to them being implicit in nature. Developing datasets to aid in the task of implicit hate speech classification comes with its own challenges; difficulties are nuances in language, varying definitions of what constitutes hate speech, and the labor-intensive process of annotating such data. This had led to a scarcity of data available to train and test such systems, which gives rise to high variance problems when parameter-heavy transformer-based models are used to address the problem. In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.

computational linguistic, detection, language model, (13 more...)

arXiv.org Artificial Intelligence

Aug-29-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - New York (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe > Italy
  - Tuscany > Florence (0.04)
- Asia
  - India (0.05)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)

Genre:
- Research Report (0.51)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found