Prosocial Behavior Detection in Player Game Chat: From Aligning Human-AI Definitions to Efficient Annotation at Scale
Kocielnik, Rafal, Kim, Min, Penphob, null, Boonyarungsrit, null, Soltani, Fereshteh, Sambrano, Deshawn, Anandkumar, Animashree, Alvarez, R. Michael
–arXiv.org Artificial Intelligence
Detecting prosociality in text--communication intended to affirm, support, or improve others' behavior--is a novel and increasingly important challenge for trust and safety systems. Unlike toxic content detection, prosociality lacks well-established definitions and labeled data, requiring new approaches to both annotation and deployment. We present a practical, three-stage pipeline that enables scalable, high-precision prosocial content classification while minimizing human labeling effort and inference costs. First, we identify the best LLM-based labeling strategy using a small seed set of human-labeled examples. We then introduce a human-AI refinement loop, where annotators review high-disagreement cases between GPT-4 and humans to iteratively clarify and expand the task definition-a critical step for emerging annotation tasks like prosociality. This process results in improved label quality and definition alignment. Finally, we synthesize 10k high-quality labels using GPT-4 and train a two-stage inference system: a lightweight classifier handles high-confidence predictions, while only $\sim$35\% of ambiguous instances are escalated to GPT-4o. This architecture reduces inference costs by $\sim$70% while achieving high precision ($\sim$0.90). Our pipeline demonstrates how targeted human-AI interaction, careful task formulation, and deployment-aware architecture design can unlock scalable solutions for novel responsible AI tasks.
arXiv.org Artificial Intelligence
Aug-11-2025
- Country:
- Asia > Myanmar
- Tanintharyi Region > Dawei (0.04)
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Slovakia > Bratislava
- Bratislava (0.04)
- Spain > Aragón (0.04)
- Italy > Calabria
- North America > United States
- California > Los Angeles County
- Pasadena (0.04)
- Santa Monica (0.04)
- California > Los Angeles County
- Asia > Myanmar
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.68)
- Research Report
- Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)
- Technology: