Combating high variance in Data-Scarce Implicit Hate Speech Classification
Pal, Debaditya, Chaudhari, Kaustubh, Sharma, Harsh
–arXiv.org Artificial Intelligence
Hate speech classification has been a long-standing problem in natural language processing. However, even though there are numerous hate speech detection methods, they usually overlook a lot of hateful statements due to them being implicit in nature. Developing datasets to aid in the task of implicit hate speech classification comes with its own challenges; difficulties are nuances in language, varying definitions of what constitutes hate speech, and the labor-intensive process of annotating such data. This had led to a scarcity of data available to train and test such systems, which gives rise to high variance problems when parameter-heavy transformer-based models are used to address the problem. In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
arXiv.org Artificial Intelligence
Aug-29-2022
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- New York (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > San Diego County
- San Diego (0.04)
- Canada > Quebec
- Montreal (0.04)
- Europe > Italy
- Asia
- India (0.05)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Genre:
- Research Report (0.51)
- Technology: