Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning
Zhang, Rufan, Zhang, Lin, Mi, Xianghang
–arXiv.org Artificial Intelligence
The proliferation of harmful online content--e.g., toxicity, spam, and negative sentiment--demands robust and adaptable moderation systems. However, prevailing moderation systems are centralized and task-specific, offering limited transparency and neglecting diverse user preferences--an approach ill-suited for privacy-sensitive or decentralized environments. We propose a novel framework that leverages in-context learning (ICL) with foundation models to unify the detection of toxicity, spam, and negative sentiment across binary, multi-class, and multi-label settings. Crucially, our approach enables lightweight personalization, allowing users to easily block new categories, unblock existing ones, or extend detection to semantic variations through simple prompt-based interventions--all without model retraining. Extensive experiments on public benchmarks (TextDetox, UCI SMS, SST2) and a new, annotated Mastodon dataset reveal that: (i) foundation models achieve strong cross-task generalization, often matching or surpassing task-specific fine-tuned models; (ii) effective personalization is achievable with as few as one user-provided example or definition; and (iii) augmenting prompts with label definitions or rationales significantly enhances robustness to noisy, real-world data. Our work demonstrates a definitive shift beyond one-size-fits-all moderation, establishing ICL as a practical, privacy-preserving, and highly adaptable pathway for the next generation of user-centric content safety systems. To foster reproducibility and facilitate future research, we publicly release our code on GitHub and the annotated Mastodon dataset on Hugging Face.
arXiv.org Artificial Intelligence
Nov-11-2025
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- China (0.04)
- Indonesia > Java
- Middle East
- Israel (0.14)
- Palestine > Gaza Strip
- Gaza Governorate > Gaza (0.04)
- Khan Yunis Governorate > Khan Yunis (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Singapore (0.04)
- Europe
- North America
- Dominican Republic (0.04)
- United States > Washington
- King County > Seattle (0.14)
- Africa > Ethiopia
- Genre:
- Overview (0.92)
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (0.69)
- Natural Language
- Chatbot (0.93)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Machine Learning
- Communications > Social Media (1.00)
- Data Science (1.00)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology