FDA
Heterogeneous treatment effect estimation with subpopulation identification for personalized medicine in opioid use disorder
Lee, Seungyeon, Liu, Ruoqi, Song, Wenyu, Zhang, Ping
Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients.
Fast Dual-Regularized Autoencoder for Sparse Biological Data
Algorithms for sparse matrix completion are used in recommender systems to predict user preferences to items such as news, movies, or songs [1]. The same methods can be successfully applied in other fields, for instance in systems biology to predict gene-disease associations or in computational systems pharmacology to predict adverse drug reactions [2] and to repurpose FDA approved drugs [3]. Matrix completion is the task of filling out missing entries in an observed sparse matrix. A low rank solution to matrix completion problem can be obtained via matrix factorization, a technique that approximates the input sparse matrix as a product of two lower dimensional matrices of users' and items' latent vectors [4]. Despite efforts to develop more sophisticated techniques, such as the methods based on artificial neural networks [5], matrix factorization remains the method of choice in recommender systems due to its efficiency and high accuracy [6].
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations
Pei, Qizhi, Zhang, Wei, Zhu, Jinhua, Wu, Kehan, Gao, Kaiyuan, Wu, Lijun, Xia, Yingce, Yan, Rui
Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, $\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After fine-tuning, BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities. Our code is available at $\href{https://github.com/QizhiPei/BioT5}{Github}$.
AI is coming for big pharma
If there's one thing we can all agree upon, it's that the 21st century's captains of industry are trying to shoehorn AI into every corner of our world. But for all of the ways in which AI will be shoved into our faces and not prove very successful, it might actually have at least one useful purpose. Risk mitigation isn't a sexy notion but it's worth understanding how common it is for a new drug project to fail. To set the scene, consider that each drug project takes between three and five years to form a hypothesis strong enough to start tests in a laboratory. A 2022 study from Professor Duxin Sun found that 90 percent of clinical drug development fails, with each project costing more than 2 billion.
From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process
Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.
SubgroupTE: Advancing Treatment Effect Estimation with Subgroup Identification
Lee, Seungyeon, Liu, Ruoqi, Song, Wenyu, Li, Lang, Zhang, Ping
Precise estimation of treatment effects is crucial for evaluating intervention effectiveness. While deep learning models have exhibited promising performance in learning counterfactual representations for treatment effect estimation (TEE), a major limitation in most of these models is that they treat the entire population as a homogeneous group, overlooking the diversity of treatment effects across potential subgroups that have varying treatment effects. This limitation restricts the ability to precisely estimate treatment effects and provide subgroup-specific treatment recommendations. In this paper, we propose a novel treatment effect estimation model, named SubgroupTE, which incorporates subgroup identification in TEE. SubgroupTE identifies heterogeneous subgroups with different treatment responses and more precisely estimates treatment effects by considering subgroup-specific causal effects. In addition, SubgroupTE iteratively optimizes subgrouping and treatment effect estimation networks to enhance both estimation and subgroup identification. Comprehensive experiments on the synthetic and semi-synthetic datasets exhibit the outstanding performance of SubgroupTE compared with the state-of-the-art models on treatment effect estimation. Additionally, a real-world study demonstrates the capabilities of SubgroupTE in enhancing personalized treatment recommendations for patients with opioid use disorder (OUD) by advancing treatment effect estimation with subgroup identification.
Using Twitter Data to Understand Public Perceptions of Approved versus Off-label Use for COVID-19-related Medications
Hua, Yining, Jiang, Hang, Lin, Shixu, Yang, Jie, Plasek, Joseph M., Bates, David W., Zhou, Li
Understanding public discourse on emergency use of unproven therapeutics is crucial for monitoring safe use and combating misinformation. We developed a natural language processing-based pipeline to comprehend public perceptions of and stances on coronavirus disease 2019 (COVID-19)-related drugs on Twitter over time. This retrospective study included 609,189 US-based tweets from January 29, 2020, to November 30, 2021, about four drugs that garnered significant public attention during the COVID-19 pandemic: (1) Hydroxychloroquine and Ivermectin, therapies with anecdotal evidence; and (2) Molnupiravir and Remdesivir, FDA-approved treatments for eligible patients. Time-trend analysis was employed to understand popularity trends and related events. Content and demographic analyses were conducted to explore potential rationales behind people's stances on each drug. Time-trend analysis indicated that Hydroxychloroquine and Ivermectin were discussed more than Molnupiravir and Remdesivir, particularly during COVID-19 surges. Hydroxychloroquine and Ivermectin discussions were highly politicized, related to conspiracy theories, hearsay, and celebrity influences. The distribution of stances between the two major US political parties was significantly different (P < .001); Republicans were more likely to support Hydroxychloroquine (55%) and Ivermectin (30%) than Democrats. People with healthcare backgrounds tended to oppose Hydroxychloroquine (7%) more than the general population, while the general population was more likely to support Ivermectin (14%). Our study found that social media users have varying perceptions and stances on off-label versus FDA-authorized drug use at different stages of COVID-19. This indicates that health systems, regulatory agencies, and policymakers should design tailored strategies to monitor and reduce misinformation to promote safe drug use.
ADCNet: a unified framework for predicting the activity of antibody-drug conjugates
Chen, Liye, Li, Biaoshun, Chen, Yihao, Lin, Mujie, Zhang, Shipeng, Li, Chenxin, Pang, Yu, Wang, Ling
Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introduce a unified deep learning framework called ADCNet to help design potential ADCs. The ADCNet highly integrates the protein representation learning language model ESM-2 and small-molecule representation learning language model FG-BERT models to achieve activity prediction through learning meaningful features from antigen and antibody protein sequences of ADC, SMILES strings of linker and payload, and drug-antibody ratio (DAR) value. Based on a carefully designed and manually tailored ADC data set, extensive evaluation results reveal that ADCNet performs best on the test set compared to baseline machine learning models across all evaluation metrics. For example, it achieves an average prediction accuracy of 87.12%, a balanced accuracy of 0.8689, and an area under receiver operating characteristic curve of 0.9293 on the test set. In addition, cross-validation, ablation experiments, and external independent testing results further prove the stability, advancement, and robustness of the ADCNet architecture. For the convenience of the community, we develop the first online platform (https://ADCNet.idruglab.cn) for the prediction of ADCs activity based on the optimal ADCNet model, and the source code is publicly available at https://github.com/idrugLab/ADCNet.
Morphological Profiling for Drug Discovery in the Era of Deep Learning
Tang, Qiaosi, Ratnayake, Ranjala, Seabra, Gustavo, Jiang, Zhe, Fang, Ruogu, Cui, Lina, Ding, Yousong, Kahveci, Tamer, Bian, Jiang, Li, Chenglong, Luesch, Hendrik, Li, Yanjun
Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high-throughput. These efforts have facilitated understanding of compound mechanism-of-action (MOA), drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.
Difficulty in chirality recognition for Transformer architectures learning chemical structures from string
Yoshikai, Yasuhiro, Mizuno, Tadahaya, Nemoto, Shumpei, Kusuhara, Hiroyuki
Recent years have seen rapid development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this black box, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. We show that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low performance due to misunderstanding of enantiomers. These findings are expected to deepen the understanding of NLP models in chemistry.