Goto

Collaborating Authors

 Chaurasia, Akash


Almanac: Retrieval-Augmented Language Models for Clinical Medicine

arXiv.org Artificial Intelligence

In recent years, language model pre-training has emerged as a powerful training paradigm in natural language processing (NLP) [1-4]. For a large number of these language models, performance improvements have been empirically observed to scale with model and dataset size, with the well-documented emergence of zero-shot capabilities and sample efficiency on a range of downstream NLP tasks [5-7]. However, due the nature of their training objective-- predicting the next token in a sentence--large language models (LLMs) can be prone to generating factually incorrect statements, a phenomenon commonly known as hallucination [8, 9]. More contentiously, many works have also demonstrated these models' ability to reproduce social biases, as well as generating statements reinforcing gender, racial, and religious stereotypes [10, 11]. In an effort to reduce these unwanted behaviors, several works have explored different ways of steering LLM outputs to more closely align with user-intent, including fine-tuning with human feedback [12, 13] and natural language prompt engineering [14, 15].