Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting
Richter-Pechanski, Phillip, Wiesenbach, Philipp, Schwab, Dominic M., Kiriakou, Christina, Geis, Nicolas, Dieterich, Christoph, Frank, Anette
–arXiv.org Artificial Intelligence
Automatic extraction of medical information from these data poses several challenges: high costs of required clinical expertise, restricted computational resources, strict privacy regulations, and limited interpretability of model predictions. Recent domain adaptation and prompting methods using lightweight masked language models showed promising results with minimal training data and allow for application of well-established interpretability methods. We are first to present a systematic evaluation of advanced domain adaptation and prompting methods in a low-resource medical domain task, performing multiclass section classification on German doctor's letters. We evaluate a variety of models, model sizes, (further-pre)training and task settings, and conduct extensive class-wise evaluations supported by Shapley values to validate the quality of small-scale training data, and to ensure interpretability of model predictions. We show that in few-shot learning scenarios, a lightweight, domain-adapted pretrained language model, prompted with just 20 shots per section class, outperforms a traditional classification model, by increasing accuracy from 48.6% to 79.1%.
arXiv.org Artificial Intelligence
Mar-20-2024