Learning to design protein-protein interactions with enhanced generalization

Bushuiev, Anton, Bushuiev, Roman, Kouba, Petr, Filkin, Anatolii, Gabrielova, Marketa, Gabriel, Michal, Sedlar, Jiri, Pluskal, Tomas, Damborsky, Jiri, Mazurenko, Stanislav, Sivic, Josef

arXiv.org Artificial Intelligence 

Discovering mutations enhancing protein-protein interactions (PPIs) is critical for advancing biomedical research and developing improved therapeutics. While machine learning approaches have substantially advanced the field, they often struggle to generalize beyond training data in practical scenarios. The contributions of this work are three-fold. First, we construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions, enabling effective large-scale learning. We finetune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function. Finally, we demonstrate the enhanced generalization of our new PPIformer approach by outperforming other state-of-the-art methods on new, non-leaking splits of standard labeled PPI mutational data and independent case studies optimizing a human antibody against SARS-CoV-2 and increasing the thrombolytic activity of staphylokinase. The goal of this work is to develop a reliable method for designing protein-protein interactions (PPIs). We focus on predicting binding affinity changes of protein complexes upon mutations. This problem, also referred to as G prediction, is the central challenge of protein binder design (Marchand et al., 2022). The discovery of mutations increasing binding affinity unlocks application areas of tremendous importance, most notably in healthcare and biotechnology. Interactions between proteins play a crucial role in mechanisms of various diseases including cancer and neurodegenerative diseases (Lu et al., 2020; Ivanov et al., 2013). Simultaneously, they offer potential pathways for the action of protein-based therapeutics in addressing other medical conditions, such as stroke, which stands as a leading cause of disability and mortality worldwide (Feigin et al., 2022; Nikitin et al., 2022).