AITopics

1902.02767

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJan-31-2019

Functional Regularisation for Continual Learning using Gaussian Processes

Titsias, Michalis K., Schwarz, Jonathan, Matthews, Alexander G. de G., Pascanu, Razvan, Teh, Yee Whye

We introduce a novel approach for supervised continual learning based on approximate Bayesian inference over function space rather than the parameters of a deep neural network. We use a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Functional regularisation for continual learning naturally arises by applying the variational sparse GP inference method in a sequential fashion as new tasks are encountered. At each step of the process, a summary is constructed for the current task that consists of (i) inducing inputs and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms that appear in the variational lower bound, and reduces the effects of catastrophic forgetting. We fully develop the theory of the method and we demonstrate its effectiveness in classification datasets, such as Split-MNIST, Permuted-MNIST and Omniglot.

continual learning, deep learning, neural network, (16 more...)

1901.11356

Country:

Europe (0.93)
North America > United States > Texas > Travis County > Austin (0.14)

Genre:

Research Report (1.00)
Workflow (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningJan-17-2019

Hierarchical Representations with Poincar\'e Variational Auto-Encoders

Mathieu, Emile, Lan, Charline Le, Maddison, Chris J., Tomioka, Ryota, Teh, Yee Whye

The Variational Auto-Encoder (VAE) model has become widely popular as a way to learn at once a generative model and embeddings for observations living in a high-dimensional space. In the real world, many such observations may be assumed to be hierarchically structured, such as living organisms data which are related through the evolutionary tree. Also, it has been theoretically and empirically shown that data with hierarchical structure can efficiently be embedded in hyperbolic spaces. We therefore endow the VAE with a hyperbolic geometry and empirically show that it can better generalise to unseen data than its Euclidean counterpart, and can qualitatively recover the hierarchical structure.

deep learning, geometry, neural network, (20 more...)

1901.06033

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJan-17-2019

Probabilistic symmetry and invariant neural networks

Bloem-Reddy, Benjamin, Teh, Yee Whye

In an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings, much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures. We treat the neural network input and output as random variables, and consider group invariance from the perspective of probabilistic symmetry. Drawing on tools from probability and statistics, we establish a link between functional and probabilistic symmetry, and obtain generative functional representations of joint and conditional probability distributions that are invariant or equivariant under the action of a compact group. Those representations completely characterize the structure of neural networks that can be used to model such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We develop the details of the general program for exchangeable sequences and arrays, recovering a number of recent examples as special cases.

conditional distribution, deep learning, neural network, (21 more...)

1901.06082

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

arXiv.org Machine LearningJan-17-2019

Attentive Neural Processes

Kim, Hyunjik, Mnih, Andriy, Schwarz, Jonathan, Garnelo, Marta, Eslami, Ali, Rosenbaum, Dan, Vinyals, Oriol, Teh, Yee Whye

Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

deep learning, neural network, prediction, (21 more...)

1901.05761

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Kosiorek, Adam, Kim, Hyunjik, Teh, Yee Whye, Posner, Ingmar

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for image sequences. It can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects. This is achieved by explicitly encoding object numbers, locations and appearances in the latent variables of the model. SQAIR retains all strengths of its predecessor, Attend, Infer, Repeat (AIR, Eslami et. al. 2016), including unsupervised learning, made possible by inductive biases present in the model structure. We use a moving multi-\textsc{mnist} dataset to show limitations of AIR in detecting overlapping or partially occluded objects, and show how \textsc{sqair} overcomes them by leveraging temporal consistency of objects. Finally, we also apply SQAIR to real-world pedestrian CCTV data, where it learns to reliably detect, track and generate walking pedestrians with no supervision.

deep learning, neural network, sqair, (18 more...)

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Kosiorek, Adam, Kim, Hyunjik, Teh, Yee Whye, Posner, Ingmar

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for image sequences. It can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects. This is achieved by explicitly encoding object numbers, locations and appearances in the latent variables of the model. SQAIR retains all strengths of its predecessor, Attend, Infer, Repeat (AIR, Eslami et. al. 2016), including unsupervised learning, made possible by inductive biases present in the model structure. We use a moving multi-\textsc{mnist} dataset to show limitations of AIR in detecting overlapping or partially occluded objects, and show how \textsc{sqair} overcomes them by leveraging temporal consistency of objects. Finally, we also apply SQAIR to real-world pedestrian CCTV data, where it learns to reliably detect, track and generate walking pedestrians with no supervision.

artificial intelligence, machine learning, sqair, (17 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Causal Inference via Kernel Deviance Measures

Mitrovic, Jovana, Sejdinovic, Dino, Teh, Yee Whye

Discovering the causal structure among a set of variables is a fundamental problem in many areas of science. In this paper, we propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. From a novel interpretation of the notion of asymmetry between cause and effect, we derive a corresponding asymmetry measure using the framework of reproducing kernel Hilbert spaces. Based on this, we propose three decision rules for causal discovery. We demonstrate the wide applicability and robustness of our method across a range of diverse synthetic datasets. Furthermore, we test our method on real-world time series data and the real-world benchmark dataset Tübingen Cause-Effect Pairs where we outperform state-of-the-art approaches.

artificial intelligence, conditional distribution, machine learning, (19 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.34)
Europe > United Kingdom (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Stochastic Expectation Maximization with Variance Reduction

Chen, Jianfei, Zhu, Jun, Teh, Yee Whye, Zhang, Tong

Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step. Stochastic Expectation Maximization (sEM) reduces the cost of E-step by stochastic approximation. However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune. In this paper, we propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms. We show that sEM-vr has the same exponential asymptotic convergence rate as batch EM. Moreover, sEM-vr only requires a constant step size to achieve this rate, which alleviates the burden of parameter tuning. We compare sEM-vr with batch EM, sEM and other algorithms on Gaussian mixture models and probabilistic latent semantic analysis, and sEM-vr converges significantly faster than these baselines.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Country:

Asia > China (0.28)
Europe > United Kingdom > England (0.14)

Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data

Miscouridou, Xenia, Caron, Francois, Teh, Yee Whye

We propose a novel class of network models for temporal dyadic interaction data. Our objective is to capture important features often observed in social interactions: sparsity, degree heterogeneity, community structure and reciprocity. We use mutually-exciting Hawkes processes to model the interactions between each (directed) pair of individuals. The intensity of each process allows interactions to arise as responses to opposite interactions (reciprocity), or due to shared interests between individuals (community structure). For sparsity and degree heterogeneity, we build the non time dependent part of the intensity function on compound random measures following Todeschini et al., 2016. We conduct experiments on real-world temporal interaction data and show that the proposed model outperforms competing approaches for link prediction, and leads to interpretable parameters.

artificial intelligence, interaction, machine learning, (18 more...)