AdaNovo: Towards Robust De Novo Peptide Sequencing in Proteomics against Data Biases Jun Xia
–Neural Information Processing Systems
Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Despite the development of several deep learning methods for predicting amino acid sequences (peptides) responsible for generating the observed mass spectra, training data biases hinder further advancements of de novo peptide sequencing. Firstly, prior methods struggle to identify amino acids with Post-Translational Modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in unsatisfactory peptide sequencing performance. Secondly, various noise and missing peaks in mass spectra reduce the reliability of training data (Peptide-Spectrum Matches, PSMs). To address these challenges, we propose AdaNovo, a novel and domain knowledge-inspired framework that calculates Conditional Mutual Information (CMI) between the mass spectra and amino acids or peptides, using CMI for robust training against above biases. Extensive experiments indicate that AdaNovo outperforms previous competitors on the widely-used 9-species benchmark, meanwhile yielding 3.6% - 9.4% improvements in PTMs identification.
Neural Information Processing Systems
May-28-2025, 07:14:49 GMT
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Technology: