dictNN: A Dictionary-Enhanced CNN Approach for Classifying Hate Speech on Twitter
Kupi, Maximilian, Bodnar, Michael, Schmidt, Nikolas, Posada, Carlos Eduardo
–arXiv.org Artificial Intelligence
Hate speech on social media is a growing concern, and automated methods have so far been sub-par at reliably detecting it. A major challenge lies in the potentially evasive nature of hate speech due to the ambiguity and fast evolution of natural language. To tackle this, we introduce a vectorisation based on a crowd-sourced and continuously updated dictionary of hate words and propose fusing this approach with standard word embedding in order to improve the classification performance of a CNN model. To train and test our model we use a merge of two established datasets (110,748 tweets in total). By adding the dictionary-enhanced input, we are able to increase the CNN model's predictive power and increase the F1 macro score by seven percentage points.
arXiv.org Artificial Intelligence
Mar-15-2021
- Country:
- South America > Brazil (0.04)
- North America
- United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- California > San Diego County
- San Diego (0.04)
- New York > New York County
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Germany > Berlin (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.05)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Middle East > Qatar
- Japan > Kyūshū & Okinawa
- Okinawa (0.04)
- Genre:
- Overview (0.66)
- Research Report (0.64)
- Industry:
- Information Technology (0.46)
- Technology: