Efficiently predicting high resolution mass spectra with graph neural networks

Murphy, Michael, Jegelka, Stefanie, Fraenkel, Ernest, Kind, Tobias, Healey, David, Butler, Thomas

arXiv.org Artificial Intelligence 

The identification of unknown small molecules in complex chemical mixtures is a primary challenge in many areas of chemical and biological science. The standard high-throughput approach to small molecule identification is tandem mass spectrometry (MS/MS), with diverse applications including metabolomics [1], drug discovery [2], clinical diagnostics [3], forensics [4], and environmental monitoring [5]. The key bottleneck in MS/MS is structural elucidation: given a mass spectrum, we must determine the 2D structure of the molecule it represents. This problem is far from solved, and adversely impacts all areas of science that use MS/MS. Typically only 2 4% of spectra are identified in untargeted metabolomics experiments [6], and a recent competition saw no more than 30% accuracy [7]. Because MS/MS is a lossy measurement, and existing training sets are small, direct prediction of structures from spectra is particularly challenging. Therefore the most common approach is spectral library search, which casts the problem as information retrieval [8]: an observed spectrum is queried against a library of spectra with known structures. This provides an informative prior, and has the advantage of easy interpretability as the entire space of solutions is known.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found