AITopics | Foster, Ian

Plotting

Foster, Ian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

Babuji, Yadu, Blaiszik, Ben, Brettin, Tom, Chard, Kyle, Chard, Ryan, Clyde, Austin, Foster, Ian, Hong, Zhi, Jha, Shantenu, Li, Zhuozhao, Liu, Xuefeng, Ramanathan, Arvind, Ren, Yi, Saint, Nicholaus, Schwarting, Marcus, Stevens, Rick, van Dam, Hubertus, Wagner, Rick

arXiv.org Machine LearningMay-27-2020

Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.

dataset, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

2006.02431

Country: North America > United States (1.00)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery

Jha, Dipendra, Ward, Logan, Yang, Zijiang, Wolverton, Christopher, Foster, Ian, Liao, Wei-keng, Choudhary, Alok, Agrawal, Ankit

arXiv.org Machine LearningJul-7-2019

Materials discovery is crucial for making scientific advances in many domains. Collections of data from experiments and first-principle computations have spurred interest in applying machine learning methods to create predictive models capable of mapping from composition and crystal structures to materials properties. Generally, these are regression problems with the input being a 1D vector composed of numerical attributes representing the material composition and/or crystal structure. While neural networks consisting of fully connected layers have been applied to such problems, their performance often suffers from the vanishing gradient problem when network depth is increased. In this paper, we study and propose design principles for building deep regression networks composed of fully connected layers with numerical vectors as input. We introduce a novel deep regression network with individual residual learning, IRNet, that places shortcut connections after each layer so that each layer learns the residual mapping between its output and input. We use the problem of learning properties of inorganic materials from numerical attributes derived from material composition and/or crystal structure to compare IRNet's performance against that of other machine learning techniques. Using multiple datasets from the Open Quantum Materials Database (OQMD) and Materials Project for training and evaluation, we show that IRNet provides significantly better prediction performance than the state-of-the-art machine learning approaches currently used by domain scientists. We also show that IRNet's use of individual residual learning leads to better convergence during the training phase than when shortcut connections are between multi-layer stacks while maintaining the same number of parameters.

deep learning, neural network, residual learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1145/3292500.3330703

1907.03222

Country: North America > United States > Illinois > Cook County (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning Prediction of Accurate Atomization Energies of Organic Molecules from Low-Fidelity Quantum Chemical Calculations

Ward, Logan, Blaiszik, Ben, Foster, Ian, Assary, Rajeev S., Narayanan, Badri, Curtiss, Larry

arXiv.org Machine LearningJun-7-2019

Recent studies illustrate how machine learning (ML) can be used to bypass a core challenge of molecular modeling: the tradeoff between accuracy and computational cost. Here, we assess multiple ML approaches for predicting the atomization energy of organic molecules. Our resulting models learn the difference between low-fidelity, B3LYP, and high-accuracy, G4MP2, atomization energies, and predict the G4MP2 atomization energy to 0.005 eV (mean absolute error) for molecules with less than 9 heavy atoms and 0.012 eV for a small set of molecules with between 10 and 14 heavy atoms. Our two best models, which have different accuracy/speed tradeoffs, enable the efficient prediction of G4MP2-level energies for large molecules and are available through a simple web interface.

deep learning, molecule, neural network, (19 more...)

arXiv.org Machine Learning

1906.03233

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Energy (0.69)
Government > Regional Government > North America Government > United States Government (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

DLHub: Model and Data Serving for Science

Chard, Ryan, Li, Zhuozhao, Chard, Kyle, Ward, Logan, Babuji, Yadu, Woodard, Anna, Tuecke, Steve, Blaiszik, Ben, Franklin, Michael J., Foster, Ian

arXiv.org Machine LearningNov-27-2018

Abstract--While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities witha focus on science applications. First, its selfservice modelrepository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published modelsthrough a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications. I. INTRODUCTION Machine Learning (ML) is disrupting nearly every aspect of computing. Researchers now turn to ML methods to uncover patterns in vast data collections and to make decisions with little or no human input. As ML becomes increasingly pervasive, newsystems are required to support the development, adoption, and application of ML. We refer to the broad class of systems designed to support ML as "learning systems." Learning systems need to support the entire ML lifecycle (see Figure 1), including model development [1, 2]; scalable training across potentially tens of thousands of cores and GPUs [3]; model publication and sharing [4]; and low latency and highthroughput inference[5]; all while encouraging best-practice software engineering when developing models [6].

dlhub, health & medicine, neural network, (21 more...)

arXiv.org Machine Learning

1811.11213

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.68)
Information Technology > Services (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback