Thousand Voices of Trauma: ALarge-Scale Synthetic Dataset for Modeling Prolonged Exposure Therapy Conversations
–Neural Information Processing Systems
The advancement of AI systems for mental health support is hindered by limited access to therapeutic conversation data, particularly for trauma treatment. We present Thousand Voices of Trauma, a synthetic benchmark dataset of 3,000 therapy conversations based on Prolonged Exposure therapy protocols for Post-traumatic Stress Disorder (PTSD). The dataset comprises 500 unique cases, each explored through six conversational perspectives that mirror the progression of therapy from initial anxiety to peak distress to emotional processing. We incorporated diverse demographic profiles (ages 18-80, M=49.3, 49.4% male, 44.4% female, 6.2% nonbinary), 20 trauma types, and 10 trauma-related behaviors using deterministic and probabilistic generation methods. Analysis reveals realistic distributions of trauma types (witnessing violence 10.6%, bullying 10.2%) and symptoms (nightmares 23.4%, substance abuse 20.8%).
Neural Information Processing Systems
Jun-17-2026, 12:37:49 GMT
- Country:
- North America > United States (1.00)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: