Ohno-Machado, Lucila
Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives
Ma, Xinyao, Zhu, Rui, Wang, Zihao, Xiong, Jingwei, Chen, Qingyu, Tang, Haixu, Camp, L. Jean, Ohno-Machado, Lucila
Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing scenarios, particularly in simulating domain-specific experts using tailored prompts. This ability enables LLMs to adopt the persona of individuals with specific backgrounds, offering a cost-effective and efficient alternative to traditional, resource-intensive user studies. By mimicking human behavior, LLMs can anticipate responses based on concrete demographic or professional profiles. In this paper, we evaluate the effectiveness of LLMs in simulating individuals with diverse backgrounds and analyze the consistency of these simulated behaviors compared to real-world outcomes. In particular, we explore the potential of LLMs to interpret and respond to discharge summaries provided to patients leaving the Intensive Care Unit (ICU). We evaluate and compare with human responses the comprehensibility of discharge summaries among individuals with varying educational backgrounds, using this analysis to assess the strengths and limitations of LLM-driven simulations. Notably, when LLMs are primed with educational background information, they deliver accurate and actionable medical guidance 88% of the time. However, when other information is provided, performance significantly drops, falling below random chance levels. This preliminary study shows the potential benefits and pitfalls of automatically generating patient-specific health information from diverse populations. While LLMs show promise in simulating health personas, our results highlight critical gaps that must be addressed before they can be reliably used in clinical settings. Our findings suggest that a straightforward query-response model could outperform a more tailored approach in delivering health information. This is a crucial first step in understanding how LLMs can be optimized for personalized health communication while maintaining accuracy.
Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model
Gilson, Aidan, Ai, Xuguang, Xie, Qianqian, Srinivasan, Sahana, Pushpanathan, Krithi, Singer, Maxwell B., Huang, Jimin, Kim, Hyunjae, Long, Erping, Wan, Peixing, Del Priore, Luciano V., Ohno-Machado, Lucila, Xu, Hua, Liu, Dianbo, Adelman, Ron A., Tham, Yih-Chung, Chen, Qingyu
Large Language Models (LLMs) are poised to revolutionize healthcare. Ophthalmology-specific LLMs remain scarce and underexplored. We introduced an open-source, specialized LLM for ophthalmology, termed Language Enhanced Model for Eye (LEME). LEME was initially pre-trained on the Llama2 70B framework and further fine-tuned with a corpus of ~127,000 non-copyrighted training instances curated from ophthalmology-specific case reports, abstracts, and open-source study materials. We benchmarked LEME against eight other LLMs, namely, GPT-3.5, GPT-4, three Llama2 models (7B, 13B, 70B), PMC-LLAMA 13B, Meditron 70B, and EYE-Llama (another ophthalmology-specific LLM). Evaluations included four internal validation tasks: abstract completion, fill-in-the-blank, multiple-choice questions (MCQ), and short-answer QA. External validation tasks encompassed long-form QA, MCQ, patient EHR summarization, and clinical QA. Evaluation metrics included Rouge-L scores, accuracy, and expert evaluation of correctness, completeness, and readability. In internal validations, LEME consistently outperformed its counterparts, achieving Rouge-L scores of 0.20 in abstract completion (all p<0.05), 0.82 in fill-in-the-blank (all p<0.0001), and 0.22 in short-answer QA (all p<0.0001, except versus GPT-4). In external validations, LEME excelled in long-form QA with a Rouge-L of 0.19 (all p<0.0001), ranked second in MCQ accuracy (0.68; all p<0.0001), and scored highest in EHR summarization and clinical QA (ranging from 4.24 to 4.83 out of 5 for correctness, completeness, and readability). LEME's emphasis on robust fine-tuning and the use of non-copyrighted data represents a breakthrough in open-source ophthalmology-specific LLMs, offering the potential to revolutionize execution of clinical tasks while democratizing research collaboration.
Me LLaMA: Foundation Large Language Models for Medical Applications
Xie, Qianqian, Chen, Qingyu, Chen, Aokun, Peng, Cheng, Hu, Yan, Lin, Fongci, Peng, Xueqing, Huang, Jimin, Zhang, Jeffrey, Keloth, Vipina, Zhou, Xinyu, He, Huan, Ohno-Machado, Lucila, Wu, Yonghui, Xu, Hua, Bian, Jiang
Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation models - Me-LLaMA 13/70B, along with their chat-enhanced versions - Me-LLaMA 13/70B-chat, developed through continual pre-training and instruction tuning of LLaMA2 using large medical datasets. Our methodology leverages a comprehensive domain-specific data suite, including a large-scale, continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and a new medical evaluation benchmark (MIBE) across six critical medical tasks with 12 datasets. Our extensive evaluation using the MIBE shows that Me-LLaMA models achieve overall better performance than existing open-source medical LLMs in zero-shot, few-shot and supervised learning abilities. With task-specific instruction tuning, Me-LLaMA models outperform ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. In addition, we investigated the catastrophic forgetting problem, and our results show that Me-LLaMA models outperform other open-source medical LLMs in mitigating this issue. Me-LLaMA is one of the largest open-source medical foundation LLMs that use both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other open-source medical LLMs, rendering it an attractive choice for medical AI applications. We release our models, datasets, and evaluation scripts at: https://github.com/BIDS-Xu-Lab/Me-LLaMA.
Predicting accurate probabilities with a ranking loss
Menon, Aditya, Jiang, Xiaoqian, Vembu, Shankar, Elkan, Charles, Ohno-Machado, Lucila
In many real-world applications of machine learning classifiers, it is essential to predict the probability of an example belonging to a particular class. This paper proposes a simple technique for predicting probabilities based on optimizing a ranking loss, followed by isotonic regression. This semi-parametric technique offers both good ranking and regression performance, and models a richer set of probability distributions than statistical workhorses such as logistic regression. We provide experimental results that show the effectiveness of this technique on real-world applications of probability prediction.