SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detection

Kazemi, Arefeh, Qadeer, Hamza, Wagner, Joachim, Hosseini, Hossein, Kalaivendan, Sri Balaaji Natarajan, Davis, Brian

Dec-10-2025–arXiv.org Artificial Intelligence

We introduce SynBullying, a synthetic multi-LLM conversational dataset for studying and detecting cyberbullying (CB). SynBullying provides a scalable and ethically safe alternative to human data collection by leveraging large language models (LLMs) to simulate realistic bullying interactions. The dataset offers (i) conversational structure, capturing multi-turn exchanges rather than isolated posts; (ii) context-aware annotations, where harmfulness is assessed within the conversational flow considering context, intent, and discourse dynamics; and (iii) fine-grained labeling, covering various CB categories for detailed linguistic and behavioral analysis. We evaluate SynBullying across five dimensions, including conversational structure, lexical patterns, sentiment/toxicity, role dynamics, harm intensity, and CB-type distribution. We further examine its utility by testing its performance as standalone training data and as an augmentation source for CB classification.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-10-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report (1.00)

Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.64)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.99)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found