Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text
Mitra, Avijit, Druhl, Emily, Goodwin, Raelene, Yu, Hong
–arXiv.org Artificial Intelligence
Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06% and uncovers areas for future refinements.
arXiv.org Artificial Intelligence
Jun-10-2024
- Country:
- North America > United States
- Massachusetts
- Middlesex County > Lowell (0.04)
- Hampshire County > Amherst (0.04)
- Massachusetts
- Asia > Middle East
- Iraq > Muthanna Governorate (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
- Law > Criminal Law (0.93)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Health Care Providers & Services (1.00)
- Consumer Health (1.00)
- Therapeutic Area
- Oncology (0.68)
- Neurology > Alzheimer's Disease (0.68)
- Psychiatry/Psychology
- Addiction Disorder (0.68)
- Mental Health (0.67)
- Government
- Technology: