Goto

Collaborating Authors

 drug interaction



RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation

Zhao, Jiahao, Xu, Luxin, Tan, Minghuan, Zhang, Lichao, Argha, Ahmadreza, Alinejad-Rokny, Hamid, Yang, Min

arXiv.org Artificial Intelligence

Numerous medical systems powered by Large Language Models (LLMs) have achieved remarkable progress in diverse healthcare tasks. However, research on their medication safety remains limited due to the lack of real world datasets, constrained by privacy and accessibility issues. Moreover, evaluation of LLMs in realistic clinical consultation settings, particularly regarding medication safety, is still underexplored. To address these gaps, we propose a framework that simulates and evaluates clinical consultations to systematically assess the medication safety capabilities of LLMs. Within this framework, we generate inquiry diagnosis dialogues with embedded medication risks and construct a dedicated medication safety database, RxRisk DB, containing 6,725 contraindications, 28,781 drug interactions, and 14,906 indication-drug pairs. A two-stage filtering strategy ensures clinical realism and professional quality, resulting in the benchmark RxSafeBench with 2,443 high-quality consultation scenarios. We evaluate leading open-source and proprietary LLMs using structured multiple choice questions that test their ability to recommend safe medications under simulated patient contexts. Results show that current LLMs struggle to integrate contraindication and interaction knowledge, especially when risks are implied rather than explicit. Our findings highlight key challenges in ensuring medication safety in LLM-based systems and provide insights into improving reliability through better prompting and task-specific tuning. RxSafeBench offers the first comprehensive benchmark for evaluating medication safety in LLMs, advancing safer and more trustworthy AI-driven clinical decision support.


Large language models management of medications: three performance analyses

Henry, Kelli, Xu, Steven, Blotske, Kaitlin, Cargile, Moriah, Barreto, Erin F., Murray, Brian, Smith, Susan, Bauer, Seth R., Zhao, Xingmeng, Tilley, Adeleine, Gao, Yanjun, Liu, Tianming, Sohn, Sunghwan, Sikora, Andrea

arXiv.org Artificial Intelligence

Purpose: Large language models (LLMs) have proven performance for certain diagnostic tasks, however limited studies have evaluated their consistency in recommending appropriate medication regimens for a given diagnosis. Medication management is a complex task that requires synthesis of drug formulation and complete order instructions for safe use. Here, the performance of GPT 4o, an LLM available with ChatGPT, was tested for three medication management tasks. Methods: GPT-4o performance was tested using three medication tasks: identifying available formulations for a given generic drug name, identifying drug-drug interactions (DDI) for a given medication regimen, and preparing a medication order for a given generic drug name. For each experiment, the models raw text response was captured exactly as returned and evaluated using clinician evaluation in addition to standard LLM metrics, including Term Frequency-Inverse Document Frequency (TF IDF) vectors, normalized Levenshtein similarity, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE 1/ROUGE L F1) between each response and its reference string. Results: For the first task of drug-formulation matching, GPT-4o had 49% accuracy for generic medications being matched to all available formulations, with an average of 1.23 omissions per medication and 1.14 hallucinations per medication. For the second task of drug-drug interaction identification, the accuracy was 54.7% for identifying the DDI pair. For the third task, GPT-4o generated order sentences containing no medication or abbreviation errors in 65.8% of cases. Conclusions: Model performance for basic medication tasks was consistently poor. This evaluation highlights the need for domain-specific training through clinician-annotated datasets and a comprehensive evaluation framework for benchmarking performance.



Predicting Drug-Drug Interactions Using Heterogeneous Graph Neural Networks: HGNN-DDI

Liu, Hongbo, Li, Siyi, Yu, Zheng

arXiv.org Artificial Intelligence

Drug-drug interactions (DDIs) are a major concern in clinical practice, as they can lead to reduced therapeutic efficacy or severe adverse effects. Traditional computational approaches often struggle to capture the complex relationships among drugs, targets, and biological entities. In this work, we propose HGNN-DDI, a heterogeneous graph neural network model designed to predict potential DDIs by integrating multiple drug-related data sources. HGNN-DDI leverages graph representation learning to model heterogeneous biomedical networks, enabling effective information propagation across diverse node and edge types. Experimental results on benchmark DDI datasets demonstrate that HGNN-DDI outperforms state-of-the-art baselines in prediction accuracy and robustness, highlighting its potential to support safer drug development and precision medicine.


Retrieval Augmented Large Language Model System for Comprehensive Drug Contraindications

Bang, Byeonghun, Yoon, Jongsuk, Chang, Dong-Jin, Park, Seho, Lee, Yong Oh

arXiv.org Artificial Intelligence

The versatility of large language models (LLMs) has been explored across various sectors, but their application in healthcare poses challenges, particularly in the domain of pharmaceutical contraindications where accurate and reliable information is required. This study enhances the capability of LLMs to address contraindications effectively by implementing a Retrieval Augmented Generation (RAG) pipeline. Utilizing OpenAI's GPT-4o-mini as the base model, and the text-embedding-3-small model for embeddings, our approach integrates Langchain to orchestrate a hybrid retrieval system with re-ranking. This system leverages Drug Utilization Review (DUR) data from public databases, focusing on contraindications for specific age groups, pregnancy, and concomitant drug use. The dataset includes 300 question-answer pairs across three categories, with baseline model accuracy ranging from 0.49 to 0.57. Post-integration of the RAG pipeline, we observed a significant improvement in model accuracy, achieving rates of 0.94, 0.87, and 0.89 for contraindications related to age groups, pregnancy, and concomitant drug use, respectively. The results indicate that augmenting LLMs with a RAG framework can substantially reduce uncertainty in prescription and drug intake decisions by providing more precise and reliable drug contraindication information.


A Computational Approach to Epilepsy Treatment: An AI-optimized Global Natural Product Prescription System

Wang, Zhixuan

arXiv.org Artificial Intelligence

Epilepsy is a prevalent neurological disease with millions of patients worldwide. Many patients have turned to alternative medicine due to the limited efficacy and side effects of conventional antiepileptic drugs. In this study, we developed a computationa l approach to optimize herbal epilepsy treatment through AI - driven analysis of global natural products and statistically validated randomized controlled trials (RCTs). Our intelligent prescription system combines machine learning (ML) algorithms for herb - e fficacy characterization, Bayesian optimization for personalized dosing, and meta - analysis of RCTs for evidence - based recommendations. The system analyzed 1,872 natural compounds from traditional Chinese medicine (TCM), A yurveda, and ethnopharmacological d atabases, integrating their bioactive properties with clinical outcomes from 48 RCTs covering 48 epilepsy conditions (n=5,216). Cohen's d=0.89) with statistical significance confirmed by multiple testing (p$<$0.001). A randomized double - blind validation trial (n=120) demonstrated 28.5 \ % greater s eizure frequency reduction with AI - optimized herbal prescriptions compared to conventional protocols (95 \ % CI: 18.7 - 37.3 \ %, p=0.003). Keywords: epilepsy, herbal medicine, computational pharmacology, AI - optimized prescription, natural products, machine learning, precision medicine, Bayesian optimization, clinical validation Introduction Despite being among the most difficult to treat neurological disorders (W orld Health Organization: WHO, 2024), it is estimated by the W orld Health Organization that there are close to 50 million people living with epilepsy (Figure 1A: Global Epilepsy Prevalence and Treatment Gaps).


HODDI: A Dataset of High-Order Drug-Drug Interactions for Computational Pharmacovigilance

Wang, Zhaoying, Shi, Yingdan, Liu, Xiang, Chen, Can, Wen, Jun, Wang, Ren

arXiv.org Artificial Intelligence

Drug-side effect research is vital for understanding adverse reactions arising in complex multi-drug therapies. However, the scarcity of higher-order datasets that capture the combinatorial effects of multiple drugs severely limits progress in this field. Existing resources such as TWOSIDES primarily focus on pairwise interactions. To fill this critical gap, we introduce HODDI, the first Higher-Order Drug-Drug Interaction Dataset, constructed from U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) records spanning the past decade, to advance computational pharmacovigilance. HODDI contains 109,744 records involving 2,506 unique drugs and 4,569 unique side effects, specifically curated to capture multi-drug interactions and their collective impact on adverse effects. Comprehensive statistical analyses demonstrate HODDI's extensive coverage and robust analytical metrics, making it a valuable resource for studying higher-order drug relationships. Evaluating HODDI with multiple models, we found that simple Multi-Layer Perceptron (MLP) can outperform graph models, while hypergraph models demonstrate superior performance in capturing complex multi-drug interactions, further validating HODDI's effectiveness. Our findings highlight the inherent value of higher-order information in drug-side effect prediction and position HODDI as a benchmark dataset for advancing research in pharmacovigilance, drug safety, and personalized medicine. The dataset and codes are available at https://github.com/TIML-Group/HODDI.


Development of CODO: A Comprehensive Tool for COVID-19 Data Representation, Analysis, and Visualization

Dutta, Biswanath, Bain, Debanjali

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has become indispensable for managing and processing the vast amounts of data generated during the COVID-19 pandemic. Ontology, which formalizes knowledge within a domain using standardized vocabularies and relationships, plays a crucial role in AI by enabling automated reasoning, data integration, semantic interoperability, and extracting meaningful insights from extensive datasets. The diversity of COVID-19 datasets poses challenges in comprehending this information for both human and machines. Existing COVID-19 ontologies are designed to address specific aspects of the pandemic but lack comprehensive coverage across all essential dimensions. To address this gap, CODO, an integrated ontological model has been developed encompassing critical facets of COVID-19 information such as aetiology, epidemiology, transmission, pathogenesis, diagnosis, prevention, genomics, therapeutic safety, and more. This paper reviews CODO since its inception in 2020, detailing its developments and highlighting CODO as a tool for the aggregation, representation, analysis, and visualization of diverse COVID-19 data. The major contribution of this paper is to provide a summary of the development of CODO, and outline the overall development and evaluation approach. By adhering to best practices and leveraging W3C standards, CODO ensures data integration and semantic interoperability, supporting effective navigation of COVID-19 complexities across various domains.


Drug Package Recommendation via Interaction-aware Graph Induction

Zheng, Zhi, Wang, Chao, Xu, Tong, Shen, Dazhong, Qin, Penggang, Huai, Baoxing, Liu, Tongzhu, Chen, Enhong

arXiv.org Artificial Intelligence

Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., synergistic or antagonistic effect, have been largely ignored. To that end, in this paper, we target at developing a new paradigm for drug package recommendation with considering the interaction effect within drugs, in which the interaction effects could be affected by patient conditions. Specifically, we first design a pre-training method based on neural collaborative filtering to get the initial embedding of patients and drugs. Then, the drug interaction graph will be initialized based on medical records and domain knowledge. Along this line, we propose a new Drug Package Recommendation (DPR) framework with two variants, respectively DPR on Weighted Graph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solve the problem, in which each the interactions will be described as signed weights or attribute vectors. In detail, a mask layer is utilized to capture the impact of patient condition, and graph neural networks (GNNs) are leveraged for the final graph induction task to embed the package. Extensive experiments on a real-world data set from a first-rate hospital demonstrate the effectiveness of our DPR framework compared with several competitive baseline methods, and further support the heuristic study for the drug package generation task with adequate performance.