MedSynth: Realistic, Synthetic Medical Dialogue-Note Pairs
Mianroodi, Ahmad Rezaie, Rezaie, Amirali, Todorov, Niko Grisel, Rakovski, Cyril, Rudzicz, Frank
–arXiv.org Artificial Intelligence
Physicians spend significant time documenting clinical encounters, a burden that contributes to professional burnout. To address this, robust automation tools for medical documentation are crucial. We introduce MedSynth -- a novel dataset of synthetic medical dialogues and notes designed to advance the Dialogue-to-Note (Dial-2-Note) and Note-to-Dialogue (Note-2-Dial) tasks. Informed by an extensive analysis of disease distributions, this dataset includes over 10,000 dialogue-note pairs covering over 2000 ICD-10 codes. We demonstrate that our dataset markedly enhances the performance of models in generating medical notes from dialogues, and dialogues from medical notes. The dataset provides a valuable resource in a field where open-access, privacy-compliant, and diverse training data are scarce. Code is available at https://github.com/ahmadrezarm/MedSynth/tree/main and the dataset is available at https://huggingface.co/datasets/Ahmad0067/MedSynth.
arXiv.org Artificial Intelligence
Aug-5-2025
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine
- Consumer Health (1.00)
- Diagnostic Medicine (1.00)
- Health Care Providers & Services (1.00)
- Health Care Technology > Medical Record (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Endocrinology (1.00)
- Immunology (0.94)
- Musculoskeletal (1.00)
- Neurology (1.00)
- Psychiatry/Psychology (0.67)
- Pulmonary/Respiratory Diseases (0.92)
- Rheumatology (0.68)
- Health & Medicine
- Technology: