Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models

Chen, Zizhang, Li, Peizhao, Dong, Xiaomeng, Hong, Pengyu

Nov-5-2024–arXiv.org Artificial Intelligence

Language models, such as [1, 2, 3] have emerged to be an efficient tool in the domain of EHR tasks. These models, extensively trained on diverse sources of clinical data, such as physician notes and longitudinal medical codes, have demonstrated remarkable effectiveness in predicting clinical outcomes. Despite their capabilities, measuring and reducing the uncertainties of these models in EHR tasks is crucial for ensuring patient safety, as clinicians can avoid interventions that the model indicates are uncertain and potentially hazardous. In addition, quantifying the uncertainties in clinical tasks can enhance the reliability of AI-driven medical decision-making systems [4]. To address this challenge, leveraging the transparency of model parameters, we utilize established uncertainty metrics and propose to combine them with ensembling and multi-tasking approaches to effectively quantify and mitigate uncertainties in EHR tasks for these white-box language models. Recently, large language models have embarked on demonstrating their utility in clinical-related tasks, including EHR prediction tasks [5], analyzing radiology report examinations [6] and medical reasoning [7]. However, the encapsulation of modern Large Language Models, typically offered as API services with restricted access to internal model parameters and prediction probabilities, impedes the direct application of traditional uncertainty quantification methods. To overcome this limitation, We redefine uncertainty quantification as a post-hoc approach by analyzing the distribution of answers generated repeatedly from our designed prompts for clinical prediction tasks. Inspired by the effectiveness of our proposed methods in reducing model uncertainty for white-box LMs, we adapted and applied ensembling and multi-tasking methods to the black-box settings.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-5-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.55)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (0.48)
  - Health Care Technology > Medical Record (0.48)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (0.69)
    - Internal Medicine (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)