Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments
–arXiv.org Artificial Intelligence
As online platforms grow, comment sections increasingly host harassment that undermines user experience and well-being. This study benchmarks three leading large language models, OpenAI GPT-4.1, Google Gemini 1.5 Pro, and Anthropic Claude 3 Opus, on a corpus of 5,080 YouTube comments sampled from high-abuse threads in gaming, lifestyle, food vlog, and music channels. The dataset comprises 1,334 harmful and 3,746 non-harmful messages in English, Arabic, and Indonesian, annotated independently by two reviewers with substantial agreement (Cohen's kappa = 0.83). Using a unified prompt and deterministic settings, GPT-4.1 achieved the best overall balance with an F1 score of 0.863, precision of 0.887, and recall of 0.841. Gemini flagged the highest share of harmful posts (recall = 0.875) but its precision fell to 0.767 due to frequent false positives. Claude delivered the highest precision at 0.920 and the lowest false-positive rate of 0.022, yet its recall dropped to 0.720. Qualitative analysis showed that all three models struggle with sarcasm, coded insults, and mixed-language slang. These results underscore the need for moderation pipelines that combine complementary models, incorporate conversational context, and fine-tune for under-represented languages and implicit abuse. A de-identified version of the dataset and full prompts is publicly released to promote reproducibility and further progress in automated content moderation.
arXiv.org Artificial Intelligence
Jun-3-2025
- Country:
- Asia
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- North Macedonia > Skopje Statistical Region
- Skopje Municipality > Skopje (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > San Diego County
- Canada > Ontario
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Technology: