AITopics | Foster, Ian

Plotting

Foster, Ian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

Shah, Ashka, DePavia, Adela, Hudson, Nathaniel, Foster, Ian, Stevens, Rick

arXiv.org Artificial IntelligenceJun-10-2024

The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way -- without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by the set of directed acyclic graphs, to find the graph that best explains the data. For high-dimensional problems, however, this search becomes intractable and scalable algorithms for causal discovery are needed to bridge the gap. In this paper, we define a novel causal graph partition that allows for divide-and-conquer causal discovery with theoretical guarantees. We leverage the idea of a superstructure -- a set of learned or existing candidate hypotheses -- to partition the search space. We prove under certain assumptions that learning with a causal graph partition always yields the Markov Equivalence Class of the true causal graph. We show our algorithm achieves comparable accuracy and a faster time to solution for biologically-tuned synthetic networks and networks up to ${10^4}$ variables. This makes our method applicable to gene regulatory network inference and other domains with high-dimensional structured hypothesis spaces.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.06348

Country: North America > United States > Illinois (0.15)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

Add feedback

Oil & Water? Diffusion of AI Within and Across Scientific Fields

Duede, Eamon, Dolan, William, Bauer, André, Foster, Ian, Lakhani, Karim

arXiv.org Artificial IntelligenceMay-23-2024

This study empirically investigates claims of the increasing ubiquity of artificial intelligence (AI) within roughly 80 million research publications across 20 diverse scientific fields, by examining the change in scholarly engagement with AI from 1985 through 2022. We observe exponential growth, with AI-engaged publications increasing approximately thirteenfold (13x) across all fields, suggesting a dramatic shift from niche to mainstream. Moreover, we provide the first empirical examination of the distribution of AI-engaged publications across publication venues within individual fields, with results that reveal a broadening of AI engagement within disciplines. While this broadening engagement suggests a move toward greater disciplinary integration in every field, increased ubiquity is associated with a semantic tension between AI-engaged research and more traditional disciplinary research. Through an analysis of tens of millions of document embeddings, we observe a complex interplay between AI-engaged and non-AI-engaged research within and across fields, suggesting that increasing ubiquity is something of an oil-and-water phenomenon -- AI-engaged work is spreading out over fields, but not mixing well with non-AI-engaged work.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.15828

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Twins in rotational spectroscopy: Does a rotational spectrum uniquely identify a molecule?

Schwarting, Marcus, Seifert, Nathan A., Davis, Michael J., Blaiszik, Ben, Foster, Ian, Prozument, Kirill

arXiv.org Artificial IntelligenceApr-5-2024

Rotational spectroscopy is the most accurate method for determining structures of molecules in the gas phase. It is often assumed that a rotational spectrum is a unique "fingerprint" of a molecule. The availability of large molecular databases and the development of artificial intelligence methods for spectroscopy makes the testing of this assumption timely. In this paper, we pose the determination of molecular structures from rotational spectra as an inverse problem. Within this framework, we adopt a funnel-based approach to search for molecular twins, which are two or more molecules, which have similar rotational spectra but distinctly different molecular structures. We demonstrate that there are twins within standard levels of computational accuracy by generating rotational constants for many molecules from several large molecular databases, indicating the inverse problem is ill-posed. However, some twins can be distinguished by increasing the accuracy of the theoretical methods or by performing additional experiments.

artificial intelligence, machine learning, molecule, (15 more...)

arXiv.org Artificial Intelligence

2404.04225

Country: North America > United States > Illinois > Cook County (0.14)

Genre: Research Report (0.82)

Industry:

Energy (0.93)
Health & Medicine (0.68)
Materials > Chemicals (0.46)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Combining Language and Graph Models for Semi-structured Information Extraction on the Web

Hong, Zhi, Chard, Kyle, Foster, Ian

arXiv.org Artificial IntelligenceFeb-21-2024

Relation extraction is an efficient way of mining the extraordinary wealth of human knowledge on the Web. Existing methods rely on domain-specific training data or produce noisy outputs. We focus here on extracting targeted relations from semi-structured web pages given only a short description of the relation. We present GraphScholarBERT, an open-domain information extraction method based on a joint graph and language model structure. GraphScholarBERT can generalize to previously unseen domains without additional data or training and produces only clean extraction results matched to the search keyword. Experiments show that GraphScholarBERT can improve extraction F1 scores by as much as 34.8\% compared to previous work in a zero-shot domain and zero-shot website setting.

data mining, large language model, relation, (18 more...)

arXiv.org Artificial Intelligence

2402.14129

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Data Science > Data Mining > Text Mining (0.70)

Add feedback

Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision

Hudson, Nathaniel, Pauloski, J. Gregory, Baughman, Matt, Kamatar, Alok, Sakarvadia, Mansi, Ward, Logan, Chard, Ryan, Bauer, André, Levental, Maksim, Wang, Wenyi, Engler, Will, Skelly, Owen Price, Blaiszik, Ben, Stevens, Rick, Chard, Kyle, Foster, Ian

arXiv.org Artificial IntelligenceFeb-5-2024

Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.0348

Country: North America > United States (0.94)

Genre:

Research Report (0.84)
Overview (0.68)

Industry:

Information Technology > Security & Privacy (0.46)
Information Technology > Services (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comprehensive Exploration of Synthetic Data Generation: A Survey

Bauer, André, Trapp, Simon, Stenger, Michael, Leppich, Robert, Kounev, Samuel, Leznik, Mark, Chard, Kyle, Foster, Ian

arXiv.org Artificial IntelligenceFeb-1-2024

Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic data emerges as a solution, but the abundance of released models and limited overview literature pose challenges for decision-making. This work surveys 417 Synthetic Data Generation (SDG) models over the last decade, providing a comprehensive overview of model types, functionality, and improvements. Common attributes are identified, leading to a classification and trend analysis. The findings reveal increased model performance and complexity, with neural network-based approaches prevailing, except for privacy-preserving data generation. Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete. Implications from our performance evaluation highlight the scarcity of common metrics and datasets, making comparisons challenging. Additionally, the neglect of training and computational costs in literature necessitates attention in future research. This work serves as a guide for SDG model selection and identifies crucial areas for future exploration.

evolutionary algorithm, machine learning, reinforcement learning, (26 more...)

arXiv.org Artificial Intelligence

2401.02524

Country:

North America > United States > Illinois > Cook County (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
Asia > Japan > Honshū (0.13)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(8 more...)

Add feedback

WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data

Weber, Maurice, Siebenschuh, Carlo, Butler, Rory, Alexandrov, Anton, Thanner, Valdemar, Tsolakis, Georgios, Jabbar, Haris, Foster, Ian, Li, Bo, Stevens, Rick, Zhang, Ce

arXiv.org Artificial IntelligenceDec-15-2023

We introduce WordScape, a novel pipeline for the creation of cross-disciplinary, multilingual corpora comprising millions of pages with annotations for document layout detection. Relating visual and textual items on document pages has gained further significance with the advent of multimodal models. Various approaches proved effective for visual question answering or layout segmentation. However, the interplay of text, tables, and visuals remains challenging for a variety of document understanding tasks. In particular, many models fail to generalize well to diverse domains and new languages due to insufficient availability of training data. WordScape addresses these limitations. Our automatic annotation pipeline parses the Open XML structure of Word documents obtained from the web, jointly providing layout-annotated document images and their textual representations. In turn, WordScape offers unique properties as it (1) leverages the ubiquity of the Word file format on the internet, (2) is readily accessible through the Common Crawl web corpus, (3) is adaptive to domain-specific documents, and (4) offers culturally and linguistically diverse document pages with natural semantic structure and high-quality text. Together with the pipeline, we will additionally release 9.5M urls to word documents which can be processed using WordScape to create a dataset of over 40M pages. Finally, we investigate the quality of text and layout annotations extracted by WordScape, assess the impact on document understanding benchmarks, and demonstrate that manual labeling costs can be substantially reduced.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.10188

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.93)
Energy (0.67)
Government > Regional Government (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

Nguyen, Tung, Shah, Rohan, Bansal, Hritik, Arcomano, Troy, Madireddy, Sandeep, Maulik, Romit, Kotamarthi, Veerabhadra, Foster, Ian, Grover, Aditya

arXiv.org Artificial IntelligenceDec-6-2023

Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints will be made publicly available.

artificial intelligence, machine learning, modeling & simulation, (14 more...)

arXiv.org Artificial Intelligence

2312.03876

Country: North America > United States (0.68)

Genre: Research Report > Promising Solution (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerating Electronic Stopping Power Predictions by 10 Million Times with a Combination of Time-Dependent Density Functional Theory and Machine Learning

Ward, Logan, Blaiszik, Ben, Lee, Cheng-Wei, Martin, Troy, Foster, Ian, Schleife, André

arXiv.org Artificial IntelligenceNov-1-2023

Knowing the rate at which particle radiation releases energy in a material, the stopping power, is key to designing nuclear reactors, medical treatments, semiconductor and quantum materials, and many other technologies. While the nuclear contribution to stopping power, i.e., elastic scattering between atoms, is well understood in the literature, the route for gathering data on the electronic contribution has for decades remained costly and reliant on many simplifying assumptions, including that materials are isotropic. We establish a method that combines time-dependent density functional theory (TDDFT) and machine learning to reduce the time to assess new materials to mere hours on a supercomputer and provides valuable data on how atomic details influence electronic stopping. Our approach uses TDDFT to compute the electronic stopping contributions to stopping power from first principles in several directions and then machine learning to interpolate to other directions at rates 10 million times higher. We demonstrate the combined approach in a study of proton irradiation in aluminum and employ it to predict how the depth of maximum energy deposition, the "Bragg Peak," varies depending on incident angle -- a quantity otherwise inaccessible to modelers. The lack of any experimental information requirement makes our method applicable to most materials, and its speed makes it a prime candidate for enabling quantum-to-continuum models of radiation damage. The prospect of reusing valuable TDDFT data for training the model make our approach appealing for applications in the age of materials data science.

accelerating electronic stopping power prediction, artificial intelligence, time-dependent density functional theory, (3 more...)

arXiv.org Artificial Intelligence

2311.00787

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

Sakarvadia, Mansi, Khan, Arham, Ajith, Aswathy, Grzenda, Daniel, Hudson, Nathaniel, Bauer, André, Chard, Kyle, Foster, Ian

arXiv.org Artificial IntelligenceOct-24-2023

Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention-head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models. The code for Attention Lens is available at github.com/msakarvadia/AttentionLens.

attention head information retrieval mechanism, large language model, natural language, (4 more...)

arXiv.org Artificial Intelligence

2310.1627

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback