A Comprehensive Review of Datasets for Clinical Mental Health AI Systems
Mandal, Aishik, Adhikary, Prottay Kumar, Arnaout, Hiba, Gurevych, Iryna, Chakraborty, Tanmoy
–arXiv.org Artificial Intelligence
Mental health disorders are rising worldwide. However, the availability of trained clinicians has not scaled proportionally, leaving many people without adequate or timely support. To bridge this gap, recent studies have shown the promise of Artificial Intelligence (AI) to assist mental health diagnosis, monitoring, and intervention. However, the development of efficient, reliable, and ethical AI to assist clinicians is heavily dependent on high-quality clinical training datasets. Despite growing interest in data curation for training clinical AI assistants, existing datasets largely remain scattered, under-documented, and often inaccessible, hindering the reproducibility, comparability, and generalizability of AI models developed for clinical mental health care. In this paper, we present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants. We categorize these datasets by mental disorders (e.g., depression, schizophrenia), data modalities (e.g., text, speech, physiological signals), task types (e.g., diagnosis prediction, symptom severity estimation, intervention generation), accessibility (public, restricted or private), and sociocultural context (e.g., language and cultural background). Along with these, we also investigate synthetic clinical mental health datasets. Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data. We conclude by outlining key challenges in curating and standardizing future datasets and provide actionable recommendations to facilitate the development of more robust, generalizable, and equitable mental health AI systems.
arXiv.org Artificial Intelligence
Aug-19-2025
- Country:
- Africa
- East Africa (0.04)
- Middle East (0.04)
- Asia
- Pakistan (0.04)
- Indonesia > Bali (0.04)
- Malaysia (0.04)
- Japan (0.04)
- Middle East
- Republic of Türkiye (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Russia (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China > Shanghai
- Shanghai (0.04)
- Southeast Asia (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- India > NCT
- Delhi (0.04)
- Europe
- United Kingdom (0.14)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France > Île-de-France
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Greece (0.04)
- Russia (0.04)
- Italy (0.04)
- Czechia > South Moravian Region
- Brno (0.04)
- Denmark (0.04)
- Netherlands (0.04)
- Germany
- Hesse > Darmstadt Region
- Darmstadt (0.04)
- Saxony > Dresden (0.04)
- Hesse > Darmstadt Region
- Poland (0.04)
- Middle East > Malta (0.04)
- Austria > Vienna (0.14)
- North America
- Canada
- British Columbia > Vancouver (0.04)
- Ontario > National Capital Region
- Ottawa (0.04)
- Costa Rica > San José Province
- San José (0.04)
- United States
- Colorado > Denver County
- Denver (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Iowa > Johnson County
- Iowa City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Washington > King County
- Seattle (0.04)
- Colorado > Denver County
- Canada
- Oceania > Australia (0.04)
- South America > Chile (0.04)
- Africa
- Genre:
- Overview (1.00)
- Research Report > Experimental Study (1.00)
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science (0.92)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language > Large Language Model (0.95)
- Representation & Reasoning > Agents (0.67)
- Speech (0.93)
- Communications > Social Media (0.93)
- Data Science > Data Mining (0.93)
- Artificial Intelligence
- Information Technology