epitope
TCR-EML: Explainable Model Layers for TCR-pMHC Prediction
Li, Jiarui, Yin, Zixiang, Ding, Zhengming, Landry, Samuel J., Mettu, Ramgopal R.
T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. "Explain-by-design" models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches. For the adaptive immune system, T cells are essential for detecting and responding to antigens from pathogens such as viruses, bacteria, and cancer cells (Joglekar & Li, 2021), as well as in autoimmune contexts.
epiGPTope: A machine learning-based epitope generator and classifier
Manrique, Natalia Flechas, Martínez, Alberto, López-Martínez, Elena, Andrea, Luc, Orus, Román, Manteca, Aitor, Cortajarena, Aitziber L., Espinosa-Portalés, Llorenç
Epitopes are short antigenic peptide sequences which are recognized by antibodies or immune cell receptors. These are central to the development of immunotherapies, vaccines, and diagnostics. However, the rational design of synthetic epitope libraries is challenging due to the large combinatorial sequence space, $20^n$ combinations for linear epitopes of n amino acids, making screening and testing unfeasible, even with high throughput experimental techniques. In this study, we present a large language model, epiGPTope, pre-trained on protein data and specifically fine-tuned on linear epitopes, which for the first time can directly generate novel epitope-like sequences, which are found to possess statistical properties analogous to the ones of known epitopes. This generative approach can be used to prepare libraries of epitope candidate sequences. We further train statistical classifiers to predict whether an epitope sequence is of bacterial or viral origin, thus narrowing the candidate library and increasing the likelihood of identifying specific epitopes. We propose that such combination of generative and predictive models can be of assistance in epitope discovery. The approach uses only primary amino acid sequences of linear epitopes, bypassing the need for a geometric framework or hand-crafted features of the sequences. By developing a method to create biologically feasible sequences, we anticipate faster and more cost-effective generation and screening of synthetic epitopes, with relevant applications in the development of new biotechnologies.
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Europe > France (0.04)
- North America > United States (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
ImmunoAI: Accelerated Antibody Discovery Using Gradient-Boosted Machine Learning with Thermodynamic-Hydrodynamic Descriptors and 3D Geometric Interface Topology
Shivakumar, Shawnak, Sandora, Matthew
Human metapneumovirus (hMPV) poses serious risks to pediatric, elderly, and immunocompromised populations. Traditional antibody discovery pipelines require 10-12 months, limiting their applicability for rapid outbreak response. This project introduces ImmunoAI, a machine learning framework that accelerates antibody discovery by predicting high-affinity candidates using gradient-boosted models trained on thermodynamic, hydrodynamic, and 3D topological interface descriptors. A dataset of 213 antibody-antigen complexes was curated to extract geometric and physicochemical features, and a LightGBM regressor was trained to predict binding affinity with high precision. The model reduced the antibody candidate search space by 89%, and fine-tuning on 117 SARS-CoV-2 binding pairs further reduced Root Mean Square Error (RMSE) from 1.70 to 0.92. In the absence of an experimental structure for the hMPV A2.2 variant, AlphaFold2 was used to predict its 3D structure. The fine-tuned model identified two optimal antibodies with predicted picomolar affinities targeting key mutation sites (G42V and E96K), making them excellent candidates for experimental testing. In summary, ImmunoAI shortens design cycles and enables faster, structure-informed responses to viral outbreaks.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > India (0.04)
Deep Neural Network-Based Prediction of B-Cell Epitopes for SARS-CoV and SARS-CoV-2: Enhancing Vaccine Design through Machine Learning
Shi, Xinyu, Tao, Yixin, Lin, Shih-Chi
The accurate prediction of B-cell epitopes is critical for guiding vaccine development against infectious diseases, including SARS and COVID-19. This study explores the use of a deep neural network (DNN) model to predict B-cell epitopes for SARS-CoVandSARS-CoV-2,leveraging a dataset that incorporates essential protein and peptide features. Traditional sequence-based methods often struggle with large, complex datasets, but deep learning offers promising improvements in predictive accuracy. Our model employs regularization techniques, such as dropout and early stopping, to enhance generalization, while also analyzing key features, including isoelectric point and aromaticity, that influence epitope recognition. Results indicate an overall accuracy of 82% in predicting COVID-19 negative and positive cases, with room for improvement in detecting positive samples. This research demonstrates the applicability of deep learning in epitope mapping, suggesting that such approaches can enhance the speed and precision of vaccine design for emerging pathogens. Future work could incorporate structural data and diverse viral strains to further refine prediction capabilities.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Los Angeles County > Northridge (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
Estimating the Causal Effects of T Cell Receptors
Weinstein, Eli N., Wood, Elizabeth B., Blei, David M.
A central question in human immunology is how a patient's repertoire of T cells impacts disease. Here, we introduce a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data. Our approach corrects for unobserved confounders, such as a patient's environment and life history, by using the patient's immature, pre-selection TCR repertoire. The pre-selection repertoire can be estimated from nonproductive TCR data, which is widely available. It is generated by a randomized mutational process, V(D)J recombination, which provides a natural experiment. We show formally how to use the pre-selection repertoire to draw causal inferences, and develop a scalable neural-network estimator for our identification formula. Our method produces an estimate of the effect of interventions that add a specific TCR sequence to patient repertoires. As a demonstration, we use it to analyze the effects of TCRs on COVID-19 severity, uncovering potentially therapeutic TCRs that are (1) observed in patients, (2) bind SARS-CoV-2 antigens in vitro and (3) have strong positive effects on clinical outcomes.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction
Liu, Chunan, Denzler, Lilian, Chen, Yihong, Martin, Andrew, Paige, Brooks
Epitope identification is vital for antibody design yet challenging due to the inherent variability in antibodies. While many deep learning methods have been developed for general protein binding site prediction tasks, whether they work for epitope prediction remains an understudied research question. The challenge is also heightened by the lack of a consistent evaluation pipeline with sufficient dataset size and epitope diversity. We introduce a filtered antibody-antigen complex structure dataset, AsEP (Antibody-specific Epitope Prediction). AsEP is the largest of its kind and provides clustered epitope groups, allowing the community to develop and test novel epitope prediction methods. AsEP comes with an easy-to-use interface in Python and pre-built graph representations of each antibody-antigen complex while also supporting customizable embedding methods. Based on this new dataset, we benchmarked various representative general protein-binding site prediction methods and find that their performances are not satisfactory as expected for epitope prediction. We thus propose a new method, WALLE, that leverages both protein language models and graph neural networks. WALLE demonstrate about 5X performance gain over existing methods. Our empirical findings evidence that epitope prediction benefits from combining sequential embeddings provided by language models and geometrical information from graph representations, providing a guideline for future method design. In addition, we reformulate the task as bipartite link prediction, allowing easy model performance attribution and interpretability. We open-source our data and code at https://github.com/biochunan/AsEP-dataset.
MATE-Pred: Multimodal Attention-based TCR-Epitope interaction Predictor
Goffinet, Etienne, Mall, Raghvendra, Singh, Ankita, Kaushik, Rahul, Castiglione, Filippo
An accurate binding affinity prediction between T-cell receptors and epitopes contributes decisively to develop successful immunotherapy strategies. Some state-of-the-art computational methods implement deep learning techniques by integrating evolutionary features to convert the amino acid residues of cell receptors and epitope sequences into numerical values, while some other methods employ pre-trained language models to summarize the embedding vectors at the amino acid residue level to obtain sequence-wise representations. Here, we propose a highly reliable novel method, MATE-Pred, that performs multi-modal attention-based prediction of T-cell receptors and epitopes binding affinity. The MATE-Pred is compared and benchmarked with other deep learning models that leverage multi-modal representations of T-cell receptors and epitopes. In the proposed method, the textual representation of proteins is embedded with a pre-trained bi-directional encoder model and combined with two additional modalities: a) a comprehensive set of selected physicochemical properties; b) predicted contact maps that estimate the 3D distances between amino acid residues in the sequences. The MATE-Pred demonstrates the potential of multi-modal model in achieving state-of-the-art performance (+8.4\% MCC, +5.5\% AUC compared to baselines) and efficiently capturing contextual, physicochemical, and structural information from amino acid residues. The performance of MATE-Pred projects its potential application in various drug discovery regimes.
- North America > United States > Hawaii (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
AI driven B-cell Immunotherapy Design
da Silva, Bruna Moreira, Ascher, David B., Geard, Nicholas, Pires, Douglas E. V.
Bruna Moreira da Silva is a PhD student at The University of Melbourne. Her research interests are in bioinformatics, immunoinformatics and machine learning to advance Global Health. David B. Ascher is the Director of Biotechnology at the University of Queensland and head of Computational Biology and Clinical Informatics at the Baker Institute and Systems and Computational Biology at Bio21 Institute. He is interested in developing and applying computational tools to assist leveraging clinical and omics data for drug discovery and personalised medicine. Nicholas Geard is an Associate Professor at the School of Computing and Information Systems at the University of Melbourne and Director of the Melbourne Data Analytics Platform. He is a computer scientist specialising in computational simulation applied to a range of problems in health and epidemiology. Douglas E. V. Pires is an Associate Professor in Digital Health at the School of Computing and Information Systems at the University of Melbourne and group leader at the Bio21 Institute. He is a computer scientist and bioinformatician specialising in machine learning and AI and the development of the next generation of tools to analyse omics data, and guide drug discovery and personalised medicine. ABSTRACT Antibodies, a prominent class of approved biologics, play a crucial role in detecting foreign antigens. The effectiveness of antigen neutralisation and elimination hinges upon the strength, sensitivity, and specificity of the paratope-epitope interaction, which demands resource-intensive experimental techniques for characterisation. In recent years, artificial intelligence and machine learning methods have made significant strides, revolutionising the prediction of protein structures and their complexes. The past decade has also witnessed the evolution of computational approaches aiming to support immunotherapy design. This review focuses on the progress of machine learning-based tools and their frameworks in the domain of B-cell immunotherapy design, encompassing linear and conformational epitope prediction, paratope prediction, and antibody design. We mapped the most commonly used data sources, evaluation metrics, and method availability and thoroughly assessed their significance and limitations, discussing the main challenges ahead. INTRODUCTION Therapeutic antibodies are a rapidly growing class of biopharmaceuticals with potentially exceptional antigen specificity and affinity. Their ability to detect and eliminate a wide array of foreign threats makes them suitable for a range of potential therapeutic and diagnostic applications. Antibody and antigen engineering have been greatly benefited by the evolution of research in computational biology, leading to innovative approaches in screening antibody targets, optimising their biochemical and physical properties, predicting and optimising binding affinity and understanding escape mutations [1].
- Oceania > Australia > Victoria > Melbourne (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (2 more...)
- Overview (1.00)
- Research Report > Experimental Study (0.46)