Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts

Basu, Chandrayee, Vasu, Rosni, Yasunaga, Michihiro, Yang, Qian

Feb-17-2023–arXiv.org Artificial Intelligence

Automatic medical text simplification can assist providers with patient-friendly communication and make medical texts more accessible, thereby improving health literacy. But curating a quality corpus for this task requires the supervision of medical experts. In this work, we present $\textbf{Med-EASi}$ ($\underline{\textbf{Med}}$ical dataset for $\underline{\textbf{E}}$laborative and $\underline{\textbf{A}}$bstractive $\underline{\textbf{Si}}$mplification), a uniquely crowdsourced and finely annotated dataset for supervised simplification of short medical texts. Its $\textit{expert-layman-AI collaborative}$ annotations facilitate $\textit{controllability}$ over text simplification by marking four kinds of textual transformations: elaboration, replacement, deletion, and insertion. To learn medical text simplification, we fine-tune T5-large with four different styles of input-output combinations, leading to two control-free and two controllable versions of the model. We add two types of $\textit{controllability}$ into text simplification, by using a multi-angle training approach: $\textit{position-aware}$, which uses in-place annotated inputs and outputs, and $\textit{position-agnostic}$, where the model only knows the contents to be edited, but not their positions. Our results show that our fine-grained annotations improve learning compared to the unannotated baseline. Furthermore, $\textit{position-aware}$ control generates better simplification than the $\textit{position-agnostic}$ one. The data and code are available at https://github.com/Chandrayee/CTRL-SIMP.

machine learning, natural language, simplification, (16 more...)

arXiv.org Artificial Intelligence

Feb-17-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)
- North America > United States (0.68)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.68)
  - Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning (0.93)
  - Communications > Social Media
    - Crowdsourcing (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found