Song, Jiangning
Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence
Sun, Yingying, A, Jun, Liu, Zhiwei, Sun, Rui, Qian, Liujia, Payne, Samuel H., Bittremieux, Wout, Ralser, Markus, Li, Chen, Chen, Yi, Dong, Zhen, Perez-Riverol, Yasset, Khan, Asif, Sander, Chris, Aebersold, Ruedi, Vizcaíno, Juan Antonio, Krieger, Jonathan R, Yao, Jianhua, Wen, Han, Zhang, Linfeng, Zhu, Yunping, Xuan, Yue, Sun, Benjamin Boyang, Qiao, Liang, Hermjakob, Henning, Tang, Haixu, Gao, Huanhuan, Deng, Yamin, Zhong, Qing, Chang, Cheng, Bandeira, Nuno, Li, Ming, E, Weinan, Sun, Siqi, Yang, Yuedong, Omenn, Gilbert S., Zhang, Yue, Xu, Ping, Fu, Yan, Liu, Xiaowen, Overall, Christopher M., Wang, Yu, Deutsch, Eric W., Chen, Luonan, Cox, Jürgen, Demichev, Vadim, He, Fuchu, Huang, Jiaxing, Jin, Huilin, Liu, Chao, Li, Nan, Luan, Zhongzhi, Song, Jiangning, Yu, Kaicheng, Wan, Wanggen, Wang, Tai, Zhang, Kang, Zhang, Le, Bell, Peter A., Mann, Matthias, Zhang, Bing, Guo, Tiannan
Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.
Rethinking Cancer Gene Identification through Graph Anomaly Analysis
Zang, Yilong, Ren, Lingfei, Li, Yue, Wang, Zhikang, Selby, David Antony, Wang, Zheng, Vollmer, Sebastian Josef, Yin, Hongzhi, Song, Jiangning, Wu, Junhang
Graph neural networks (GNNs) have shown promise in integrating protein-protein interaction (PPI) networks for identifying cancer genes in recent studies. However, due to the insufficient modeling of the biological information in PPI networks, more faithfully depiction of complex protein interaction patterns for cancer genes within the graph structure remains largely unexplored. This study takes a pioneering step toward bridging biological anomalies in protein interactions caused by cancer genes to statistical graph anomaly. We find a unique graph anomaly exhibited by cancer genes, namely weight heterogeneity, which manifests as significantly higher variance in edge weights of cancer gene nodes within the graph. Additionally, from the spectral perspective, we demonstrate that the weight heterogeneity could lead to the "flattening out" of spectral energy, with a concentration towards the extremes of the spectrum. Building on these insights, we propose the HIerarchical-Perspective Graph Neural Network (HIPGNN) that not only determines spectral energy distribution variations on the spectral perspective, but also perceives detailed protein interaction context on the spatial perspective. Extensive experiments are conducted on two reprocessed datasets STRINGdb and CPDB, and the experimental results demonstrate the superiority of HIPGNN.
CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction
Han, Rong, Liu, Xiaohong, Pan, Tong, Xu, Jing, Wang, Xiaoyu, Lan, Wuyang, Li, Zhenyu, Wang, Zixuan, Song, Jiangning, Wang, Guangyu, Chen, Ting
Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.