AITopics | Stanton, Samuel

Collaborating Authors

Stanton, Samuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Concept Bottleneck Language Models For protein design

Ismail, Aya Abdelsalam, Oikarinen, Tuomas, Wang, Amy, Adebayo, Julius, Stanton, Samuel, Joren, Taylor, Kleinhenz, Joseph, Goodman, Allen, Bravo, Héctor Corrada, Cho, Kyunghyun, Frey, Nathan C.

arXiv.org Artificial IntelligenceDec-11-2024

We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can intervene on concept values to precisely control the properties of generated proteins, achieving a 3 times larger change in desired concept values compared to baselines. ii) Interpretability: A linear mapping between concept values and predicted tokens allows transparent analysis of the model's decision-making process. iii) Debugging: This transparency facilitates easy debugging of trained models. Our models achieve pre-training perplexity and downstream task performance comparable to traditional masked protein language models, demonstrating that interpretability does not compromise performance. While adaptable to any language model, we focus on masked protein language models due to their importance in drug discovery and the ability to validate our model's capabilities through real-world experiments and expert knowledge. We scale our CB-pLM from 24 million to 3 billion parameters, making them the largest Concept Bottleneck Models trained and the first capable of generative language modeling.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.0609

Country:

North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)

Prinster, Drew, Stanton, Samuel, Liu, Anqi, Saria, Suchi

arXiv.org Machine LearningJun-5-2024

As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants' validity guarantees have assumed some form of ``quasi-exchangeability'' on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.

artificial intelligence, conformal validity guarantee exist, machine learning, (17 more...)

arXiv.org Machine Learning

2405.06627

Country:

North America > United States > New York (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Bayesian Optimization with Conformal Prediction Sets

Stanton, Samuel, Maddox, Wesley, Wilson, Andrew Gordon

arXiv.org Machine LearningDec-12-2023

Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncertainty about query outcomes. In practice, subjectively implausible outcomes can occur regularly for two reasons: 1) model misspecification and 2) covariate shift. Conformal prediction is an uncertainty quantification method with coverage guarantees even for misspecified models and a simple mechanism to correct for covariate shift. We propose conformal Bayesian optimization, which directs queries towards regions of search space where the model predictions have guaranteed validity, and investigate its behavior on a suite of black-box optimization tasks and tabular ranking tasks. In many cases we find that query coverage can be significantly improved without harming sample-efficiency.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2210.12496

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Protein Design with Guided Discrete Diffusion

Gruver, Nate, Stanton, Samuel, Frey, Nathan C., Rudner, Tim G. J., Hotzel, Isidro, Lafrance-Vanasse, Julien, Rajpal, Arvind, Cho, Kyunghyun, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceDec-12-2023

A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.20009

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

GAUCHE: A Library for Gaussian Processes in Chemistry

Griffiths, Ryan-Rhys, Klarner, Leo, Moss, Henry B., Ravuri, Aditya, Truong, Sang, Stanton, Samuel, Tom, Gary, Rankovic, Bojana, Du, Yuanqi, Jamasb, Arian, Deshwal, Aryan, Schwartz, Julius, Tripp, Austin, Kell, Gregory, Frieder, Simon, Bourached, Anthony, Chan, Alex, Moss, Jacob, Guo, Chengzhi, Durholt, Johannes, Chaurasia, Saudamini, Strieth-Kalthoff, Felix, Lee, Alpha A., Cheng, Bingqing, Aspuru-Guzik, Alán, Schwaller, Philippe, Tang, Jian

arXiv.org Artificial IntelligenceFeb-21-2023

We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Artificial Intelligence

2212.0445

Country:

Europe (0.68)
North America > Canada (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Chemicals (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Add feedback

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Stanton, Samuel, Maddox, Wesley, Gruver, Nate, Maffettone, Phillip, Delaney, Emily, Greenside, Peyton, Wilson, Andrew Gordon

arXiv.org Machine LearningJul-12-2022

Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.

artificial intelligence, machine learning, sequence, (14 more...)

arXiv.org Machine Learning

2203.12742

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deconstructing the Inductive Biases of Hamiltonian Neural Networks

Gruver, Nate, Finzi, Marc, Stanton, Samuel, Wilson, Andrew Gordon

arXiv.org Machine LearningFeb-11-2022

Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the inductive biases that make physics-inspired models successful in practice. We show that, contrary to conventional wisdom, the improved generalization of HNNs is the result of modeling acceleration directly and avoiding artificial complexity from the coordinate system, rather than symplectic structure or energy conservation. We show that by relaxing the inductive biases of these models, we can match or exceed performance on energy-conserving systems while dramatically improving performance on practical, non-conservative systems. We extend this approach to constructing transition models for common Mujoco environments, showing that our model can appropriately balance inductive biases with the flexibility required for model-based control.

artificial intelligence, hamiltonian neural network, machine learning, (2 more...)

arXiv.org Machine Learning

2202.04836

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Conditioning Sparse Variational Gaussian Processes for Online Decision-making

Maddox, Wesley J., Stanton, Samuel, Wilson, Andrew Gordon

arXiv.org Machine LearningOct-28-2021

With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least $\mathcal{O}(n^2)$ computations for $n$ training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a dataset of fixed size, but are difficult to efficiently condition on new data. We propose online variational conditioning (OVC), a procedure for efficiently conditioning SVGPs in an online setting that does not require re-training through the evidence lower bound with the addition of new data. OVC enables the pairing of SVGPs with advanced look-ahead acquisition functions for black-box optimization, even with non-Gaussian likelihoods. We show OVC provides compelling performance in a range of applications including active learning of malaria incidence, and reinforcement learning on MuJoCo simulated robotic control tasks.

artificial intelligence, filter flatedecode length 135, machine learning, (4 more...)

arXiv.org Machine Learning

2110.15172

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots (0.53)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Does Knowledge Distillation Really Work?

Stanton, Samuel, Izmailov, Pavel, Kirichenko, Polina, Alemi, Alexander A., Wilson, Andrew Gordon

arXiv.org Machine LearningJun-10-2021

Large, deep networks can learn representations that generalize well. While smaller, more efficient networks lack the inductive biases to find these representations from training data alone, they may have the capacity to represent these solutions [e.g., 1, 16, 27, 39]. Influential work on knowledge distillation [19] argues that Bucilă et al. [4] "demonstrate convincingly that the knowledge acquired by a large ensemble of models [the teacher] can be transferred to a single small model [the student]". Indeed this quote encapsulates the conventional narrative of knowledge distillation: a student model learns a high-fidelity representation of a larger teacher, enabled by the teacher's soft labels. Conversely, in Figure 1 we show that with modern architectures knowledge distillation can lead to students with very different predictions from their teachers, even when the student has the capacity to perfectly match the teacher.

deep learning, neural network, student, (19 more...)

arXiv.org Machine Learning

2106.05945

Country: North America > United States (0.92)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Kernel Interpolation for Scalable Online Gaussian Processes

Stanton, Samuel, Maddox, Wesley J., Delbridge, Ian, Wilson, Andrew Gordon

arXiv.org Machine LearningMar-1-2021

Gaussian processes (GPs) provide a gold standard for performance in online settings, such as sample-efficient control and black box optimization, where we need to update a posterior distribution as we acquire data in a sequential fashion. However, updating a GP posterior to accommodate even a single new observation after having observed $n$ points incurs at least $O(n)$ computations in the exact setting. We show how to use structured kernel interpolation to efficiently recycle computations for constant-time $O(1)$ online updates with respect to the number of points $n$, while retaining exact inference. We demonstrate the promise of our approach in a range of online regression and classification settings, Bayesian optimization, and active sampling to reduce error in malaria incidence forecasting. Code is available at https://github.com/wjmaddox/online_gp.

health & medicine, immunology, o-svgp, (20 more...)

arXiv.org Machine Learning

2103.01454

Country:

Europe (0.67)
North America > United States > California > Los Angeles County (0.14)

Genre: Research Report (0.81)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.67)

Add feedback