AITopics

2312.17174

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Nuclear Medicine (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningDec-28-2023

Function-Space Regularization in Neural Networks: A Probabilistic Perspective

Rudner, Tim G. J., Kapoor, Sanyam, Qiu, Shikai, Wilson, Andrew Gordon

Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. This method -- which we refer to as function-space empirical Bayes (FSEB) -- includes both parameter- and function-space regularization, is mathematically simple, easy to implement, and incurs only minimal computational overhead compared to standard regularization techniques. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection, highly-calibrated predictive uncertainty estimates, successful task adaption from pre-trained models, and improved generalization under covariate shift.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2312.17162

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceDec-18-2023

Perspectives on the State and Future of Deep Learning - 2023

Goldblum, Micah, Anandkumar, Anima, Baraniuk, Richard, Goldstein, Tom, Cho, Kyunghyun, Lipton, Zachary C, Mitchell, Melanie, Nakkiran, Preetum, Welling, Max, Wilson, Andrew Gordon

The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, keeping an updated list of topical questions and interviewing new community members for each edition.

large language model, machine learning, natural language, (16 more...)

2312.09323

Country: North America > United States (0.68)

Genre:

Research Report > Experimental Study (0.68)
Research Report > Strength High (0.46)

Industry:

Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-12-2023

Bayesian Optimization with Conformal Prediction Sets

Stanton, Samuel, Maddox, Wesley, Wilson, Andrew Gordon

Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncertainty about query outcomes. In practice, subjectively implausible outcomes can occur regularly for two reasons: 1) model misspecification and 2) covariate shift. Conformal prediction is an uncertainty quantification method with coverage guarantees even for misspecified models and a simple mechanism to correct for covariate shift. We propose conformal Bayesian optimization, which directs queries towards regions of search space where the model predictions have guaranteed validity, and investigate its behavior on a suite of black-box optimization tasks and tabular ranking tasks. In many cases we find that query coverage can be significantly improved without harming sample-efficiency.

artificial intelligence, machine learning, prediction, (17 more...)

2210.12496

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceDec-12-2023

Protein Design with Guided Discrete Diffusion

Gruver, Nate, Stanton, Samuel, Frey, Nathan C., Rudner, Tim G. J., Hotzel, Isidro, Lafrance-Vanasse, Julien, Rajpal, Arvind, Cho, Kyunghyun, Wilson, Andrew Gordon

A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.

artificial intelligence, machine learning, natural language, (18 more...)

2305.20009

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-7-2023

Understanding the Detrimental Class-level Effects of Data Augmentation

Kirichenko, Polina, Ibrahim, Mark, Balestriero, Randall, Bouchacourt, Diane, Vedantam, Ramakrishna, Firooz, Hamed, Wilson, Andrew Gordon

Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.

artificial intelligence, machine learning, natural language, (17 more...)

2401.01764

Country: Europe (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.67)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-5-2023

Materials Expert-Artificial Intelligence for Materials Discovery

Liu, Yanjun, Jovanovic, Milena, Mallayya, Krishnanand, Maddox, Wesley J., Wilson, Andrew Gordon, Klemenz, Sebastian, Schoop, Leslie M., Kim, Eun-Ah

The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences that are rarely articulated. We propose using machine learning to "bottle" such operational intuition into quantifiable descriptors using expertly curated measurement-based data. We introduce "Materials Expert-Artificial Intelligence" (ME-AI) to encapsulate and articulate this human intuition. As a first step towards such a program, we focus on the topological semimetal (TSM) among square-net materials as the property inspired by the expert-identified descriptor based on structural information: the tolerance factor. We start by curating a dataset encompassing 12 primary features of 879 square-net materials, using experimental data whenever possible. We then use Dirichlet-based Gaussian process regression using a specialized kernel to reveal composite descriptors for square-net topological semimetals. The ME-AI learned descriptors independently reproduce expert intuition and expand upon it. Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds. Our success with a carefully defined problem points to the "machine bottling human insight" approach as promising for machine learning-aided material discovery.

artificial intelligence, descriptor, machine learning, (18 more...)

2312.02796

Country:

Europe (0.67)
North America > United States > New York (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceDec-5-2023

Simplifying Neural Network Training Under Class Imbalance

Shwartz-Ziv, Ravid, Goldblum, Micah, Li, Yucen Lily, Bruss, C. Bayan, Wilson, Andrew Gordon

Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures. Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, optimizer, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods. We also provide key prescriptions and considerations for training under class imbalance, and an understanding of why imbalance methods succeed or fail.

artificial intelligence, deep learning, machine learning, (15 more...)

2312.02517

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-29-2023

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

Potapczynski, Andres, Finzi, Marc, Pleiss, Geoff, Wilson, Andrew Gordon

Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra). By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.

artificial intelligence, machine learning, operator, (20 more...)

2309.0306

Country:

North America > United States > Louisiana (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningNov-27-2023

Should We Learn Most Likely Functions or Parameters?

Qiu, Shikai, Rudner, Tim G. J., Kapoor, Sanyam, Wilson, Andrew Gordon

Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.

artificial intelligence, machine learning, objective, (16 more...)

2311.1599

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)