PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records
Schlegel, Viktor, Li, Hao, Wu, Yuping, Subramanian, Anand, Nguyen, Thanh-Tung, Kashyap, Abhinav Ramesh, Beck, Daniel, Zeng, Xiaojun, Batista-Navarro, Riza Theresa, Winkler, Stefan, Nenadic, Goran
–arXiv.org Artificial Intelligence
This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.
arXiv.org Artificial Intelligence
Jul-4-2023
- Country:
- Oceania > Australia
- Europe
- United Kingdom > England
- Greater Manchester > Manchester (0.04)
- Greece > Central Macedonia
- Thessaloniki (0.04)
- United Kingdom > England
- Asia > Singapore
- Central Region > Singapore (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology: