AITopics | Teh, Yee Whye

Collaborating Authors

Teh, Yee Whye

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative Models as Distributions of Functions

Dupont, Emilien, Teh, Yee Whye, Doucet, Arnaud

arXiv.org Machine LearningFeb-9-2021

Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as functions, we can abstract away from the specific type of data we train on and construct models that scale independently of signal resolution and dimension. To train our model, we use an adversarial approach with a discriminator that acts directly on continuous signals. Through experiments on both images and 3D shapes, we demonstrate that our model can learn rich distributions of functions independently of data type and resolution.

deep learning, neural network, representation, (17 more...)

arXiv.org Machine Learning

2102.04776

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

LieTransformer: Equivariant self-attention for Lie Groups

Hutchinson, Michael, Lan, Charline Le, Zaidi, Sheheryar, Dupont, Emilien, Teh, Yee Whye, Kim, Hyunjik

arXiv.org Machine LearningDec-20-2020

Group equivariant neural networks are used as building blocks of group invariant neural networks, which have been shown to improve generalisation performance and data efficiency through principled parameter sharing. Such works have mostly focused on group equivariant convolutions, building on the result that group equivariant linear maps are necessarily convolutions. In this work, we extend the scope of the literature to non-linear neural network modules, namely self-attention, that is emerging as a prominent building block of deep learning models. We propose the LieTransformer, an architecture composed of LieSelfAttention layers that are equivariant to arbitrary Lie groups and their discrete subgroups. We demonstrate the generality of our approach by showing experimental results that are competitive to baseline methods on a wide range of tasks: shape counting on point clouds, molecular property regression and modelling particle trajectories under Hamiltonian dynamics.

deep learning, lietransformer, neural network, (17 more...)

arXiv.org Machine Learning

2012.10885

Country:

Europe (0.28)
North America > United States (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Equivariant Conditional Neural Processes

Holderrieth, Peter, Hutchinson, Michael, Teh, Yee Whye

arXiv.org Machine LearningNov-25-2020

We introduce Equivariant Conditional Neural Processes (EquivCNPs), a new member of the Neural Process family that models vector-valued data in an equivariant manner with respect to isometries of $\mathbb{R}^n$. In addition, we look at multi-dimensional Gaussian Processes (GPs) under the perspective of equivariance and find the sufficient and necessary constraints to ensure a GP over $\mathbb{R}^n$ is equivariant. We test EquivCNPs on the inference of vector fields using Gaussian process samples and real-world weather data. We observe that our model significantly improves the performance of previous models. By imposing equivariance as constraints, the parameter and data efficiency of these models are increased. Moreover, we find that EquivCNPs are more robust against overfitting to local conditions of the training data.

deep learning, neural network, representation, (18 more...)

arXiv.org Machine Learning

2011.12916

Country:

North America > United States > Tennessee (0.14)
North America > United States > Arizona (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Attentive Clustering Processes

Pakman, Ari, Wang, Yueqi, Lee, Yoonho, Basu, Pallab, Lee, Juho, Teh, Yee Whye, Paninski, Liam

arXiv.org Machine LearningOct-29-2020

Amortized approaches to clustering have recently received renewed attention thanks to novel objective functions that exploit the expressiveness of deep learning models. In this work we revisit a recent proposal for fast amortized probabilistic clustering, the Clusterwise Clustering Process (CCP), which yields samples from the posterior distribution of cluster labels for sets of arbitrary size using only O(K) forward network evaluations, where K is an arbitrary number of clusters. While adequate in simple datasets, we show that the model can severely underfit complex datasets, and hypothesize that this limitation can be traced back to the implicit assumption that the probability of a point joining a cluster is equally sensitive to all the points available to join the same cluster. We propose an improved model, the Attentive Clustering Process (ACP), that selectively pays more attention to relevant points while preserving the invariance properties of the generative model. We illustrate the advantages of the new model in applications to spike-sorting in multi-electrode arrays and community discovery in networks. The latter case combines the ACP model with graph convolutional networks, and to our knowledge is the first deep learning model that handles an arbitrary number of communities.

dataset, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

2010.15727

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bootstrapping Neural Processes

Lee, Juho, Lee, Yoonho, Kim, Jungtaek, Yang, Eunho, Hwang, Sung Ju, Teh, Yee Whye

arXiv.org Machine LearningOct-27-2020

Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility. To this end, we propose the Boostrapping Neural Process (BNP), a novel extension of the NP family using the bootstrap. The bootstrap is a classical data-driven technique for estimating uncertainty, which allows BNP to learn the stochasticity in NPs without assuming a particular form. We demonstrate the efficacy of BNP on various types of data and its robustness in the presence of model-data mismatch.

banp, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

2008.02956

Country:

Asia > South Korea (0.46)
North America (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
(3 more...)

Add feedback

Behavior Priors for Efficient Reinforcement Learning

Tirumala, Dhruva, Galashov, Alexandre, Noh, Hyeonwoo, Hasenclever, Leonard, Pascanu, Razvan, Schwarz, Jonathan, Desjardins, Guillaume, Czarnecki, Wojciech Marian, Ahuja, Arun, Teh, Yee Whye, Heess, Nicolas

arXiv.org Artificial IntelligenceOct-27-2020

As we deploy reinforcement learning agents to solve increasingly challenging problems, methods that allow us to inject prior knowledge about the structure of the world and effective solution strategies becomes increasingly important. In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors that capture the common movement and interaction patterns that are shared across a set of related tasks or contexts. For example the day-to day behavior of humans comprises distinctive locomotion and manipulation patterns that recur across many different situations and goals. We discuss how such behavior patterns can be captured using probabilistic trajectory models and how these can be integrated effectively into reinforcement learning schemes, e.g.\ to facilitate multi-task and transfer learning. We then extend these ideas to latent variable models and consider a formulation to learn hierarchical priors that capture different aspects of the behavior in reusable modules. We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives, thereby offering an alternative perspective on existing ideas. We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.

deep learning, information, neural network, (18 more...)

arXiv.org Artificial Intelligence

2010.14274

Country:

North America > United States > Massachusetts (0.27)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.92)

Add feedback

How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?

Sharma, Mrinank, Mindermann, Sören, Brauner, Jan Markus, Leech, Gavin, Stephenson, Anna B., Gavenčiak, Tomáš, Kulveit, Jan, Teh, Yee Whye, Chindelevitch, Leonid, Gal, Yarin

arXiv.org Machine LearningOct-24-2020

To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make? To answer this question, we investigate 2 state-of-the-art NPI effectiveness models and propose 6 variants that make different structural assumptions. In particular, we investigate how well NPI effectiveness estimates generalise to unseen countries, and their sensitivity to unobserved factors. Models that account for noise in disease transmission compare favourably. We further evaluate how robust estimates are to different choices of epidemiological parameters and data. Focusing on models that assume transmission noise, we find that previously published results are robust across these choices and across different models. Finally, we mathematically ground the interpretation of NPI effectiveness estimates when certain common assumptions do not hold.

assumption, health & medicine, immunology, (20 more...)

arXiv.org Machine Learning

2007.13454

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Bayesian Deep Ensembles via the Neural Tangent Kernel

He, Bobby, Lakshminarayanan, Balaji, Teh, Yee Whye

arXiv.org Machine LearningOct-24-2020

We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.

ensemble, neural network, survey article, (18 more...)

arXiv.org Machine Learning

2007.05864

Country:

Europe > United Kingdom > England (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Importance Weighted Policy Learning and Adaption

Galashov, Alexandre, Sygnowski, Jakub, Desjardins, Guillaume, Humplik, Jan, Hasenclever, Leonard, Jeong, Rae, Teh, Yee Whye, Heess, Nicolas

arXiv.org Artificial IntelligenceSep-10-2020

The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones. In the meta reinforcement learning literature much recent work has focused on the problem of optimizing the learning process itself. In this paper we study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior, or default behavior that constrains the space of solutions and serves as a bias for exploration; as well as a representation for the value function, both of which are easily learned from a number of training tasks in a multi-task scenario. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.

adaptation, artificial intelligence, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2009.04875

Country:

Oceania > Australia (0.14)
Europe > Sweden (0.14)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Lottery Tickets in Linear Models: An Analysis of Iterative Magnitude Pruning

Elesedy, Bryn, Kanade, Varun, Teh, Yee Whye

arXiv.org Machine LearningAug-6-2020

The lottery ticket hypothesis [Frankle and Carbin, 2019] asserts that a randomly initialised, densely connected feed-forward neural network contains a sparse sub-network that, when trained in isolation, attains equal or higher accuracy than the full network. The method used to find these sub-networks is iterative magnitude pruning (IMP). A network is given a random initialisation, trained by some form of gradient descent for a specified number of iterations and a proportion of its smallest weights (by absolute magnitude) are deleted. The remaining weights are then reset to their initialised values and the network is retrained. This procedure can be performed multiple times, resulting in a sequence of sparse yet trainable sub-networks.

artificial intelligence, iteration, neural network, (17 more...)

arXiv.org Machine Learning

2007.08243

Country: Europe > United Kingdom > England (0.14)

Genre:

Research Report (0.64)
Contests & Prizes (0.63)

Industry: Leisure & Entertainment > Gambling (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback