AITopics | Wilson, Andrew Gordon

Plotting

Wilson, Andrew Gordon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Stanton, Samuel, Maddox, Wesley, Gruver, Nate, Maffettone, Phillip, Delaney, Emily, Greenside, Peyton, Wilson, Andrew Gordon

arXiv.org Machine LearningJul-12-2022

Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.

artificial intelligence, machine learning, sequence, (14 more...)

arXiv.org Machine Learning

2203.12742

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Low-Precision Stochastic Gradient Langevin Dynamics

Zhang, Ruqi, Wilson, Andrew Gordon, De Sa, Christopher

arXiv.org Machine LearningJun-20-2022

While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.

artificial intelligence, machine learning, sgld, (17 more...)

arXiv.org Machine Learning

2206.09909

Country:

North America > United States > Texas (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

Deconstructing the Inductive Biases of Hamiltonian Neural Networks

Gruver, Nate, Finzi, Marc, Stanton, Samuel, Wilson, Andrew Gordon

arXiv.org Machine LearningFeb-11-2022

Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the inductive biases that make physics-inspired models successful in practice. We show that, contrary to conventional wisdom, the improved generalization of HNNs is the result of modeling acceleration directly and avoiding artificial complexity from the coordinate system, rather than symplectic structure or energy conservation. We show that by relaxing the inductive biases of these models, we can match or exceed performance on energy-conserving systems while dramatically improving performance on practical, non-conservative systems. We extend this approach to constructing transition models for common Mujoco environments, showing that our model can appropriately balance inductive biases with the flexibility required for model-based control.

artificial intelligence, hamiltonian neural network, machine learning, (2 more...)

arXiv.org Machine Learning

2202.04836

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

When are Iterative Gaussian Processes Reliably Accurate?

Maddox, Wesley J., Kapoor, Sanyam, Wilson, Andrew Gordon

arXiv.org Machine LearningDec-30-2021

While recent work on conjugate gradient methods and Lanczos decompositions have achieved scalable Gaussian process inference with highly accurate point predictions, in several implementations these iterative methods appear to struggle with numerical instabilities in learning kernel hyperparameters, and poor test likelihoods. By investigating CG tolerance, preconditioner rank, and Lanczos decomposition rank, we provide a particularly simple prescription to correct these issues: we recommend that one should use a small CG tolerance ($\epsilon \leq 0.01$) and a large root decomposition size ($r \geq 5000$). Moreover, we show that L-BFGS-B is a compelling optimizer for Iterative GPs, achieving convergence with fewer gradient updates.

artificial intelligence, machine learning, tolerance, (16 more...)

arXiv.org Machine Learning

2112.15246

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Residual Pathway Priors for Soft Equivariance Constraints

Finzi, Marc, Benton, Gregory, Wilson, Andrew Gordon

arXiv.org Machine LearningDec-2-2021

There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture additional complexity. Using RPPs, we construct neural network priors with inductive biases for equivariances, but without limiting flexibility. We show that RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact. We showcase the broad applicability of RPPs with dynamical systems, tabular data, and reinforcement learning. In Mujoco locomotion tasks, where contact forces and directional rewards violate strict equivariance assumptions, the RPP outperforms baseline model-free RL agents, and also improves the learned transition models for model-based RL.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2112.01388

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Conditioning Sparse Variational Gaussian Processes for Online Decision-making

Maddox, Wesley J., Stanton, Samuel, Wilson, Andrew Gordon

arXiv.org Machine LearningOct-28-2021

With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least $\mathcal{O}(n^2)$ computations for $n$ training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a dataset of fixed size, but are difficult to efficiently condition on new data. We propose online variational conditioning (OVC), a procedure for efficiently conditioning SVGPs in an online setting that does not require re-training through the evidence lower bound with the addition of new data. OVC enables the pairing of SVGPs with advanced look-ahead acquisition functions for black-box optimization, even with non-Gaussian likelihoods. We show OVC provides compelling performance in a range of applications including active learning of malaria incidence, and reinforcement learning on MuJoCo simulated robotic control tasks.

artificial intelligence, filter flatedecode length 135, machine learning, (4 more...)

arXiv.org Machine Learning

2110.15172

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots (0.53)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Task-agnostic Continual Learning with Hybrid Probabilistic Models

Kirichenko, Polina, Farajtabar, Mehrdad, Rao, Dushyant, Lakshminarayanan, Balaji, Levine, Nir, Li, Ang, Hu, Huiyi, Wilson, Andrew Gordon, Pascanu, Razvan

arXiv.org Machine LearningJun-24-2021

Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the model's statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.

continual learning, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2106.12772

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Optimization with High-Dimensional Outputs

Maddox, Wesley J., Balandat, Maximilian, Wilson, Andrew Gordon, Bakshy, Eytan

arXiv.org Machine LearningJun-24-2021

Bayesian Optimization is a sample-efficient black-box optimization procedure that is typically applied to problems with a small number of independent objectives. However, in practice we often wish to optimize objectives defined over many correlated outcomes (or ``tasks"). For example, scientists may want to optimize the coverage of a cell tower network across a dense grid of locations. Similarly, engineers may seek to balance the performance of a robot across dozens of different environments via constrained or robust optimization. However, the Gaussian Process (GP) models typically used as probabilistic surrogates for multi-task Bayesian Optimization scale poorly with the number of outcomes, greatly limiting applicability. We devise an efficient technique for exact multi-task GP sampling that combines exploiting Kronecker structure in the covariance matrices with Matheron's identity, allowing us to perform Bayesian Optimization using exact multi-task GP models with tens of thousands of correlated outputs. In doing so, we achieve substantial improvements in sample efficiency compared to existing approaches that only model aggregate functions of the outcomes. We demonstrate how this unlocks a new class of applications for Bayesian Optimization across a range of tasks in science and engineering, including optimizing interference patterns of an optical interferometer with more than 65,000 outputs.

artificial intelligence, bayesian optimization, high-dimensional output

arXiv.org Machine Learning

2106.12997

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

Dangers of Bayesian Model Averaging under Covariate Shift

Izmailov, Pavel, Nicholson, Patrick, Lotfi, Sanae, Wilson, Andrew Gordon

arXiv.org Machine LearningJun-22-2021

Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.

artificial intelligence, bayesian inference, machine learning, (3 more...)

arXiv.org Machine Learning

2106.11905

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

Kapoor, Sanyam, Finzi, Marc, Wang, Ke Alexander, Wilson, Andrew Gordon

arXiv.org Machine LearningJun-12-2021

State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice used for high-dimensional fast bilateral filtering. Using a sparse simplicial grid instead of a dense rectangular one, we can perform GP inference exponentially faster in the dimension than SKI. Our approach, Simplex-GP, enables scaling SKI to high dimensions, while maintaining strong predictive performance. We additionally provide a CUDA implementation of Simplex-GP, which enables significant GPU acceleration of MVM based inference.

artificial intelligence, lattice, machine learning, (14 more...)

arXiv.org Machine Learning

2106.06695

Country: North America > United States > California (0.28)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Sports > Skiing (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback