AITopics | spectrometry

Collaborating Authors

spectrometry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

Neural Information Processing SystemsJun-13-2026, 14:46:17 GMT

Mass spectrometry (MS) plays a critical role in molecular identification, significantly advancing scientific discovery. However, structure elucidation from MS data remains challenging due to the scarcity of annotated spectra. While large-scale pretraining has proven effective in addressing data scarcity in other domains, applying this paradigm to mass spectrometry is hindered by the complexity and heterogeneity of raw spectral signals. To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint-molecule datasets. Multi-task pretraining objectives further enhance MS-BART's generalization by jointly optimizing denoising and translation task.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

MassSpecGym: A benchmark for the discovery and identification of molecules Roman Bushuiev

Neural Information Processing SystemsFeb-18-2026, 02:01:36 GMT

Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Czechia (0.04)
(15 more...)

Genre: Research Report (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

c6c31413d5c53b7d1c343c1498734b0f-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 16:14:35 GMT

dataset, molecule, spectra, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Czechia (0.04)
(16 more...)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)
(3 more...)

Add feedback

Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques

Sulimov, Daniil

arXiv.org Artificial IntelligenceNov-10-2024

Large Language Models (LLMs) are increasingly adopted for complex scientific text generation tasks, yet they often suffer from limitations in accuracy, consistency, and hallucination control. This thesis introduces a Parameter-Efficient Fine-Tuning (PEFT) approach tailored for GPT-like models, aiming to mitigate hallucinations and enhance reproducibility, particularly in the computational domain of mass spectrometry. We implemented Low-Rank Adaptation (LoRA) adapters to refine GPT-2, termed MS-GPT, using a specialized corpus of mass spectrometry literature. Through novel evaluation methods applied to LLMs, including BLEU, ROUGE, and Perplexity scores, the fine-tuned MS-GPT model demonstrated superior text coherence and reproducibility compared to the baseline GPT-2, confirmed through statistical analysis with the Wilcoxon rank-sum test. Further, we propose a reproducibility metric based on cosine similarity of model outputs under controlled prompts, showcasing MS-GPT's enhanced stability. This research highlights PEFT's potential to optimize LLMs for scientific contexts, reducing computational costs while improving model reliability.

mass spectrometer, mass spectrometry, spectrometry, (13 more...)

arXiv.org Artificial Intelligence

2411.06445

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre:

Research Report > Promising Solution (0.46)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MassSpecGym: A benchmark for the discovery and identification of molecules

Bushuiev, Roman, Bushuiev, Anton, de Jonge, Niek F., Young, Adamo, Kretschmer, Fleming, Samusevich, Raman, Heirman, Janne, Wang, Fei, Zhang, Luke, Dührkop, Kai, Ludwig, Marcus, Haupt, Nils A., Kalia, Apurva, Brungs, Corinna, Schmid, Robin, Greiner, Russell, Wang, Bo, Wishart, David S., Liu, Li-Ping, Rousu, Juho, Bittremieux, Wout, Rost, Hannes, Mak, Tytus D., Hassoun, Soha, Huber, Florian, van der Hooft, Justin J. J., Stravs, Michael A., Böcker, Sebastian, Sivic, Josef, Pluskal, Tomáš

arXiv.org Artificial IntelligenceOct-30-2024

The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: \textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at \url{https://github.com/pluskal-lab/MassSpecGym}.

dataset, molecule, spectra, (16 more...)

arXiv.org Artificial Intelligence

2410.23326

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine learning meets mass spectrometry: a focused perspective

Boiko, Daniil A., Ananikov, Valentine P.

arXiv.org Artificial IntelligenceJun-27-2024

Mass spectrometry is a widely used method to study molecules and processes in medicine, life sciences, chemistry, catalysis, and industrial product quality control, among many other applications. One of the main features of some mass spectrometry techniques is the extensive level of characterization (especially when coupled with chromatography and ion mobility methods, or a part of tandem mass spectrometry experiment) and a large amount of generated data per measurement. Terabyte scales can be easily reached with mass spectrometry studies. Consequently, mass spectrometry has faced the challenge of a high level of data disappearance. Researchers often neglect and then altogether lose access to the rich information mass spectrometry experiments could provide. With the development of machine learning methods, the opportunity arises to unlock the potential of these data, enabling previously inaccessible discoveries. The present perspective highlights reevaluation of mass spectrometry data analysis in the new generation of methods and describes significant challenges in the field, particularly related to problems involving the use of electrospray ionization. We argue that further applications of machine learning raise new requirements for instrumentation (increasing throughput and information density, decreasing pricing, and making more automation-friendly software), and once met, the field may experience significant transformation.

application, mass spectrometry, spectrometry, (15 more...)

arXiv.org Artificial Intelligence

2407.00117

Country:

North America > United States > New York > Richmond County > New York City (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > Netherlands (0.04)
(2 more...)

Genre:

Overview (0.67)
Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Comprehensive Lipidomic Automation Workflow using Large Language Models

Beveridge, Connor, Iyer, Sanjay, Randolph, Caitlin E., Muhoberac, Matthew, Manchanda, Palak, Clingenpeel, Amy C., Tichy, Shane, Chopra, Gaurav

arXiv.org Artificial IntelligenceMar-22-2024

Lipidomics generates large data that makes manual annotation and interpretation challenging. Lipid chemical and structural diversity with structural isomers further complicates annotation. Although, several commercial and open-source software for targeted lipid identification exists, it lacks automated method generation workflows and integration with statistical and bioinformatics tools. We have developed the Comprehensive Lipidomic Automated Workflow (CLAW) platform with integrated workflow for parsing, detailed statistical analysis and lipid annotations based on custom multiple reaction monitoring (MRM) precursor and product ion pair transitions. CLAW contains several modules including identification of carbon-carbon double bond position(s) in unsaturated lipids when combined with ozone electrospray ionization (OzESI)-MRM methodology. To demonstrate the utility of the automated workflow in CLAW, large-scale lipidomics data was collected with traditional and OzESI-MRM profiling on biological and non-biological samples. Specifically, a total of 1497 transitions organized into 10 MRM-based mass spectrometry methods were used to profile lipid droplets isolated from different brain regions of 18-24 month-old Alzheimer's disease mice and age-matched wild-type controls. Additionally, triacyclglycerols (TGs) profiles with carbon-carbon double bond specificity were generated from canola oil samples using OzESI-MRM profiling. We also developed an integrated language user interface with large language models using artificially intelligent (AI) agents that permits users to interact with the CLAW platform using a chatbot terminal to perform statistical and bioinformatic analyses. We envision CLAW pipeline to be used in high-throughput lipid structural identification tasks aiding users to generate automated lipidomics workflows ranging from data acquisition to AI agent-based bioinformatic analysis.

brain region, experiment, lipid, (17 more...)

arXiv.org Artificial Intelligence

2403.15076

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(4 more...)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

De-novo Identification of Small Molecules from Their GC-EI-MS Spectra

Hájek, Adam, Starý, Michal, Jozefov, Filip, Hecht, Helge, Price, Elliott, Křenek, Aleš

arXiv.org Artificial IntelligenceApr-4-2023

Identification of experimentally acquired mass spectra of unknown compounds presents a~particular challenge because reliable spectral databases do not cover the potential chemical space with sufficient density. Therefore machine learning based \emph{de-novo} methods, which derive molecular structure directly from its mass spectrum gained attention recently. We present a~novel method in this family, addressing a~specific usecase of GC-EI-MS spectra, which is particularly hard due to lack of additional information from the first stage of MS/MS experiments, on which the previously published methods rely. We analyze strengths and drawbacks or our approach and discuss future directions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2304.01634

Country:

North America > United States (0.04)
Europe > Czechia (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Pushing the Limits of Protein Detection: The Role of Machine Learning - CBIRT

#artificialintelligenceMar-16-2023, 05:52:33 GMT

The detection of biomolecules at the nanoscale is of great significance in fundamental biology research, just as it is for biomedical investigations. The evolution of techniques for the detection and characterization of biomolecules has resulted in remarkable scale and resolution in terms of the size and mass of the molecules. Scientists from Germany have achieved the remarkable feat of pushing the sensitivity limits of interferometric scattering (iSCAT) microscopy, a label-free optical technique for the detection of proteins. The authors accomplish this using an unsupervised machine-learning algorithm and are able to detect proteins with a mass as low as 10 kDa, which is four times smaller than proteins being detected using earlier techniques. The detection and characterization of nanoscale matter are of utmost importance in the understanding of fundamental biological mechanisms involved in physiological processes as well as in diseases.

detection, protein, small protein, (14 more...)

#artificialintelligence

Country:

Europe > Germany (0.25)
North America > Canada > Ontario (0.05)
Europe > Sweden (0.05)
Asia > India > Karnataka > Bengaluru (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Scientists use machine learning to get an unprecedented view of small molecules

#artificialintelligenceDec-19-2022, 16:45:31 GMT

A new machine learning model will help scientists identify small molecules, with applications in medicine, drug discovery and environmental chemistry. Developed by researchers at Aalto University and the University of Luxembourg, the model was trained with data from dozens of laboratories to become one of the most accurate tools for identifying small molecules. Thousands of different small molecules, known as metabolites, transport energy and transmit cellular information throughout the human body. Because they are so small, metabolites are difficult to distinguish from each other in a blood sample analysis – but identifying these molecules is important to understand how exercise, nutrition, alcohol use and metabolic disorders affect wellbeing. Metabolites are normally identified by analysing their mass and retention time with a separation technique called liquid chromatography followed by mass spectrometry.

metabolite, molecule, small molecule, (11 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area (0.38)
Health & Medicine > Pharmaceuticals & Biotechnology (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback