AITopics | Butler, Thomas

Collaborating Authors

Butler, Thomas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Voronov, Gennady, Lightheart, Rose, Davison, Joe, Krettler, Christoph A., Healey, David, Butler, Thomas

arXiv.org Artificial IntelligenceMay-5-2023

Small molecules in biological samples are studied to provide information about disease states, environmental toxins, natural product drug discovery, and many other applications. The primary window into the composition of small molecule mixtures is tandem mass spectrometry (MS2), which produces high sensitivity and part per million resolution data. We adopt multi-scale sinusoidal embeddings of the mass data in MS2 designed to meet the challenge of learning from the full resolution of MS2 data. Using these embeddings, we provide a new state of the art model for spectral library search, the standard task for initial evaluation of MS2 data. We vary the resolution of the input spectra directly by using different floating point representations of the MS2 data, and show that the resulting sinusoidal embeddings are able to learn from high resolution portion of the input MS2 data. We apply dimensionality reduction to the embeddings that result from different resolution input masses to show the essential role multi-scale sinusoidal embeddings play in learning from MS2 data. Metabolomics is the study of the small molecule (1,000 Daltons) contents of complex biological samples. Tandem Mass Spectrometry (MS/MS), in conjunction with chromatography, is one of the most commonly used tools in metabolomics.

artificial intelligence, machine learning, spectra, (17 more...)

arXiv.org Artificial Intelligence

2207.0298

Country: North America > United States (0.48)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficiently predicting high resolution mass spectra with graph neural networks

Murphy, Michael, Jegelka, Stefanie, Fraenkel, Ernest, Kind, Tobias, Healey, David, Butler, Thomas

arXiv.org Artificial IntelligenceJan-26-2023

The identification of unknown small molecules in complex chemical mixtures is a primary challenge in many areas of chemical and biological science. The standard high-throughput approach to small molecule identification is tandem mass spectrometry (MS/MS), with diverse applications including metabolomics [1], drug discovery [2], clinical diagnostics [3], forensics [4], and environmental monitoring [5]. The key bottleneck in MS/MS is structural elucidation: given a mass spectrum, we must determine the 2D structure of the molecule it represents. This problem is far from solved, and adversely impacts all areas of science that use MS/MS. Typically only 2 4% of spectra are identified in untargeted metabolomics experiments [6], and a recent competition saw no more than 30% accuracy [7]. Because MS/MS is a lossy measurement, and existing training sets are small, direct prediction of structures from spectra is particularly challenging. Therefore the most common approach is spectral library search, which casts the problem as information retrieval [8]: an observed spectrum is queried against a library of spectra with known structures. This provides an informative prior, and has the advantage of easy interpretability as the entire space of solutions is known.

artificial intelligence, machine learning, spectra, (16 more...)

arXiv.org Artificial Intelligence

2301.11419

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Hidden Biases in Unreliable News Detection Datasets

Zhou, Xiang, Elfardy, Heba, Christodoulopoulos, Christos, Butler, Thomas, Bansal, Mohit

arXiv.org Artificial IntelligenceApr-20-2021

Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these datasets. While they all provide valuable resources for future research, we observe a number of problems that may lead to results that do not generalize in more realistic settings. Specifically, we show that selection bias during data collection leads to undesired artifacts in the datasets. In addition, while most systems train and predict at the level of individual articles, overlapping article sources in the training and evaluation data can provide a strong confounding factor that models can exploit. In the presence of this confounding factor, the models can achieve good performance by directly memorizing the site-label mapping instead of modeling the real task of unreliable news detection. We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap. Using the observations and experimental results, we provide practical suggestions on how to create more reliable datasets for the unreliable news detection task. We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.

artificial intelligence, dataset, social media, (18 more...)

arXiv.org Artificial Intelligence

2104.1013

Country:

North America > United States > New York (0.14)
North America > United States > New Mexico (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback