HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

Feng, Zixin, Cui, Xinying, Sun, Yifan, Wei, Zheng, Yuan, Jiachen, Hu, Jiazhen, Xin, Ning, Hasan, Md Maruf

Mar-16-2026–arXiv.org Machine Learning

Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual and multi-label scenarios. In this paper, we propose HMS-BERT, a hybrid multi-task self-training framework for multilingual and multi-label cyberbullying detection. Built upon a pretrained multilingual BERT backbone, HMS-BERT integrates contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task. To address labeled data scarcity in low-resource languages, an iterative self-training strategy with confidence-based pseudo-labeling is introduced to facilitate cross-lingual knowledge transfer. Experiments on four public datasets demonstrate that HMS-BERT achieves strong performance, attaining a macro F1-score of up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task. Ablation studies further verify the effectiveness of the proposed components.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Machine Learning

Mar-16-2026

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas > El Paso County > El Paso (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- Asia > China
  - Shaanxi Province > Xi'an (0.05)
  - Beijing > Beijing (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.97)
- Information Technology > Security & Privacy (0.97)
- Health & Medicine > Therapeutic Area (0.68)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found