PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Schlegel, Viktor, Li, Hao, Wu, Yuping, Subramanian, Anand, Nguyen, Thanh-Tung, Kashyap, Abhinav Ramesh, Beck, Daniel, Zeng, Xiaojun, Batista-Navarro, Riza Theresa, Winkler, Stefan, Nenadic, Goran

Jul-4-2023–arXiv.org Artificial Intelligence

This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.

data augmentation, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Jul-4-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- Europe
  - United Kingdom > England
    - Greater Manchester > Manchester (0.04)
  - Greece > Central Macedonia
    - Thessaloniki (0.04)
- Asia > Singapore
  - Central Region > Singapore (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found