Kyro, Gregory W.
A Model-Centric Review of Deep Learning for Protein Design
Kyro, Gregory W., Qiu, Tianyin, Batista, Victor S.
Deep learning has transformed protein design, enabling accurate structure prediction, sequence optimization, and de novo protein generation. Advances in single - chain protein structure prediction via AlphaFold2, RoseTTAFold, ESM Fold, and others have achieved near - experimental accuracy, inspiring successive work extended to biomolecular complexes via AlphaFold Multimer, RoseTTAFold All - Atom, AlphaFold 3, Chai - 1, Boltz - 1 and others . Generative models such as Prot GPT 2, ProteinMPNN, and RFdiffusion have enabled sequence and backbone design beyond natural evolution - based limitations . More recently, joint sequence - structure co - design models, including ESM 3, have integrated both modalities into a unified framework, resulting in improved designability. Despite these advances, challenges still exist pertaining to modeling sequence - structure - function relationships and ensuring robust generalization beyond the regions of protein space spanned by the training data . Future advances wi ll likely focus on joint sequence - structure - function co - design frameworks that are able to model the fitness landscape more effectively than models that treat these modalities independently . Current capabilities, coupled with the dizzying rate of progress, suggest that the field will soon enable rapid, rational design of proteins with tailored structures and functions that transcend the limitations imposed by natural evolution. In this review, we discuss the current capabilities of deep learning methods for protein design, f ocusing on some of the most revolutionary and capable models with respect to their functionality and the applications that they enable, leading up to the current challenges of the field and the optimal path forward.
Quantum Machine Learning in Drug Discovery: Applications in Academia and Pharmaceutical Industries
Smaldone, Anthony M., Shee, Yu, Kyro, Gregory W., Xu, Chuzhi, Vu, Nam P., Dutta, Rishab, Farag, Marwa H., Galda, Alexey, Kumar, Sandeep, Kyoseva, Elica, Batista, Victor S.
In this introduction, we discuss the general methodology of quantum computing based on unitary transformations (gates) of quantum registers, which underpin the potential advancements in computational power over classical systems. We introduce the unique properties of quantum bits, or qubits, quantum calculations implemented by algorithms that evolve qubit states through unitary transformations, followed by measurements that collapse the superposition states to produce specific outcomes, and lastly the challenges faced in practical quantum computing limited by noise, with hybrid approaches that integrate quantum and classical computing to address current limitations. This introductory discussion sets the stage for a deeper exploration into quantum computing for machine learning applications in subsequent sections. Calculations with quantum computers generally require evolving the state of a quantum register by applying a sequence of pulses that implement unitary transformations according to a designed algorithm. A measurement of the resulting quantum state then collapses the coherent state, yielding a specific outcome of the calculation. To obtain reliable results, the process is typically repeated thousands of times, with averages taken over all of the measurements to account for quantum randomness and ensure statistical accuracy. This repetition is essential to achieve convergence, as each individual measurement only provides probabilistic information about the quantum state. Quantum registers are commonly based on qubits. Like classical bits, qubits can be observed in either of two possible states (0 or 1).
CardioGenAI: A Machine Learning-Based Framework for Re-Engineering Drugs for Reduced hERG Liability
Kyro, Gregory W., Martin, Matthew T., Watt, Eric D., Batista, Victor S.
The link between in vitro hERG ion channel inhibition and subsequent in vivo QT interval prolongation, a critical risk factor for the development of arrythmias such as Torsade de Pointes, is so well established that in vitro hERG activity alone is often sufficient to end the development of an otherwise promising drug candidate. It is therefore of tremendous interest to develop advanced methods for identifying hERG-active compounds in the early stages of drug development, as well as for proposing redesigned compounds with reduced hERG liability and preserved on-target potency. In this work, we present CardioGenAI, a machine learning-based framework for re-engineering both developmental and commercially available drugs for reduced hERG activity while preserving their pharmacological activity. The framework incorporates novel state-of-the-art discriminative models for predicting hERG channel activity, as well as activity against the voltage-gated NaV1.5 and CaV1.2 channels due to their potential implications in modulating the arrhythmogenic potential induced by hERG channel blockade. We applied the complete framework to pimozide, an FDA-approved antipsychotic agent that demonstrates high affinity to the hERG channel, and generated 100 refined candidates. Remarkably, among the candidates is fluspirilene, a compound which is of the same class of drugs (diphenylmethanes) as pimozide and therefore has similar pharmacological activity, yet exhibits over 700-fold weaker binding to hERG. We envision that this method can effectively be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug development programs that have stalled due to hERG-related safety concerns. Additionally, the discriminative models can also serve independently as effective components of a virtual screening pipeline. We have made all of our software open-source.
ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation
Kyro, Gregory W., Morgunov, Anton, Brent, Rafael I., Batista, Victor S.
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
Quantum Convolutional Neural Networks for Multi-Channel Supervised Learning
Smaldone, Anthony M., Kyro, Gregory W., Batista, Victor S.
As the rapidly evolving field of machine learning continues to produce incredibly useful tools and models, the potential for quantum computing to provide speed up for machine learning algorithms is becoming increasingly desirable. In particular, quantum circuits in place of classical convolutional filters for image detection-based tasks are being investigated for the ability to exploit quantum advantage. However, these attempts, referred to as quantum convolutional neural networks (QCNNs), lack the ability to efficiently process data with multiple channels and therefore are limited to relatively simple inputs. In this work, we present a variety of hardware-adaptable quantum circuit ansatzes for use as convolutional kernels, and demonstrate that the quantum neural networks we report outperform existing QCNNs on classification tasks involving multi-channel data. We envision that the ability of these implementations to effectively learn inter-channel information will allow quantum machine learning methods to operate with more complex data. This work is available as open source at https://github.com/anthonysmaldone/QCNN-Multi-Channel-Supervised-Learning.
HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction
Kyro, Gregory W., Brent, Rafael I., Batista, Victor S.
Applying deep learning concepts from image detection and graph theory has greatly advanced protein-ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two graph convolutional networks utilizing attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based Convolutional Neural Network) obtains state-of-the-art results on the PDBbind v.2016 core set, the most widely recognized benchmark in the field. We extensively assess the generalizability of our model using multiple train-test splits, each of which maximizes differences between either protein structures, protein sequences, or ligand extended-connectivity fingerprints of complexes in the training and test sets. Furthermore, we perform 10-fold cross-validation with a similarity cutoff between SMILES strings of ligands in the training and test sets, and also evaluate the performance of HAC-Net on lower-quality data. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction. All of our software is available as open source at https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is available through PyPI.