Goto

Collaborating Authors

 Undirected Networks


The Emergence of Individuality in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Individuality is essential in human society, which induces the division of labor and thus improves the efficiency and productivity. Similarly, it should also be the key to multi-agent cooperation. Inspired by that individuality is of being an individual separate from others, we propose a simple yet efficient method for the emergence of individuality (EOI) in multi-agent reinforcement learning (MARL). EOI learns a probabilistic classifier that predicts a probability distribution over agents given their observation and gives each agent an intrinsic reward of being correctly predicted by the classifier. The intrinsic reward encourages the agents to visit their own familiar observations, and learning the classifier by such observations makes the intrinsic reward signals stronger and the agents more identifiable. To further enhance the intrinsic reward and promote the emergence of individuality, two regularizers are proposed to increase the discriminability of the classifier. We implement EOI on top of popular MARL algorithms. Empirically, we show that EOI significantly outperforms existing methods in a variety of multi-agent cooperative scenarios.


Deep generative models for musical audio synthesis

arXiv.org Machine Learning

Sound modelling is the process of developing algorithms that generate sound under parametric control. There are a few distinct approaches that have been developed historically including modelling the physics of sound production and propagation, assembling signal generating and processing elements to capture acoustic features, and manipulating collections of recorded audio samples. While each of these approaches has been able to achieve high-quality synthesis and interaction for specific applications, they are all labour-intensive and each comes with its own challenges for designing arbitrary control strategies. Recent generative deep learning systems for audio synthesis are able to learn models that can traverse arbitrary spaces of sound defined by the data they train on. Furthermore, machine learning systems are providing new techniques for designing control and navigation strategies for these models. This paper is a review of developments in deep learning that are changing the practice of sound modelling.


Higher-order interactions in statistical physics and machine learning: A non-parametric solution to the inverse problem

arXiv.org Machine Learning

We propose a model-independent definition of $n$-point interaction within a system of binary and categorical random variables from first principles, via the non-parametric framework of Targeted Learning, a subfield of mathematical statistics. This definition provides an interpretation for both magnitude and sign of $2$-point, $3$-point, and general $n$-point interactions. We show that the sign of an $n$-point interaction is interpretable relative to an $(n-1)$-point interaction obtained by fixing any one of the $n$ variables. The non-parametric definition of interaction is fundamentally unbiased and reduces to familiar notions of interaction in parametric statistical physics models. Moreover, by taking into account information on conditional independence and without any further assumptions, the accuracy of interactions estimated directly from data is substantially increased whilst the number of samples required and the computational run time are both reduced. We illustrate these concepts both analytically and numerically on (i) the $2$-dimensional Ising model, (ii) an Ising-like model with non-zero $2$-point, $3$-point, and $4$-point interactions, (iii) the Restricted Boltzmann Machine (RBM), and argue that the formulation applies to energy-based models more generally. The non-parametric formulation allows for the direct reconstruction of the Hamiltonian from the data it generated. Finally, we discuss novel applications of this work, namely estimating causal molecular interactions leading to physiological outcomes, in population biomedicine.


Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

arXiv.org Machine Learning

In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the transition probabilities are not known. In the presence of long-term (or average) constraints, the agent has to choose a policy that maximizes the long-term average reward as well as satisfy the average constraints in each episode. The key challenge with the long-term constraints is that the optimal policy is not deterministic in general, and thus standard Q-learning approaches cannot be directly used. This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints. For any $\gamma\in(0,\frac{1}{2})$, the proposed algorithm is shown to achieve $O(T^{1/2+\gamma})$ regret bound for the obtained reward and $O(T^{1-\gamma/2})$ regret bound for the constraint violation, where $T$ is the total number of steps. We note that these are the first results on regret analysis for MDP with long-term constraints, where the transition probabilities are not known apriori.


Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

arXiv.org Machine Learning

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.


Recurrent Flow Networks: A Recurrent Latent Variable Model for Spatio-Temporal Density Modelling

arXiv.org Machine Learning

When modelling real-valued sequences, a typical approach in current RNN architectures is to use a Gaussian mixture model to describe the conditional output distribution. In this paper, we argue that mixture-based distributions could exhibit structural limitations when faced with highly complex data distributions such as for spatial densities. To address this issue, we introduce recurrent flow networks which combine deterministic and stochastic recurrent hidden states with conditional normalizing flows to form a probabilistic neural generative model capable of describing the kind of variability observed in highly structured spatio-temporal data. Inspired by the model's factorization, we further devise a structured variational inference network to approximate the intractable posterior distribution by exploiting a spatial representation of the data. We empirically evaluate our model against other generative models for sequential data on three real-world datasets for the task of spatio-temporal transportation demand modelling. Results show how the added flexibility allows our model to generate distributions matching potentially complex urban topologies.


Neural Physicist: Learning Physical Dynamics from Image Sequences

arXiv.org Artificial Intelligence

We present a novel architecture named Neural Physicist (NeurPhy) to learn physical dynamics directly from image sequences using deep neural networks. For any physical system, given the global system parameters, the time evolution of states is governed by the underlying physical laws. How to learn meaningful system representations in an end-to-end way and estimate accurate state transition dynamics facilitating long-term prediction have been long-standing challenges. In this paper, by leveraging recent progresses in representation learning and state space models (SSMs), we propose NeurPhy, which uses variational auto-encoder (VAE) to extract underlying Markovian dynamic state at each time step, neural process (NP) to extract the global system parameters, and a non-linear non-recurrent stochastic state space model to learn the physical dynamic transition. We apply NeurPhy to two physical experimental environments, i.e., damped pendulum and planetary orbits motion, and achieve promising results. Our model can not only extract the physically meaningful state representations, but also learn the state transition dynamics enabling long-term predictions for unseen image sequences. Furthermore, from the manifold dimension of the latent state space, we can easily identify the degree of freedom (DoF) of the underlying physical systems.


Physically constrained short-term vehicle trajectory forecasting with naive semantic maps

arXiv.org Artificial Intelligence

Urban environments manifest a high level of complexity, and therefore it is of vital importance for safety systems embedded within autonomous vehicles (AVs) to be able to accurately predict the short-term future motion of nearby agents. This problem can be further understood as generating a sequence of future coordinates for a given agent based on its past motion data e.g. position, velocity, acceleration etc, and whilst current approaches demonstrate plausible results they have a propensity to neglect a scene's physical constrains. In this paper we propose the model based on a combination of the CNN and LSTM encoder-decoder architecture that learns to extract a relevant road features from semantic maps as well as general motion of agents and uses this learned representation to predict their short-term future trajectories. We train and validate the model on the publicly available dataset that provides data from urban areas, allowing us to examine it in challenging and uncertain scenarios. We show that our model is not only capable of anticipating future motion whilst taking into consideration road boundaries, but can also effectively and precisely predict trajectories for a longer time horizon than initially trained for.


Fitted Q-Learning for Relational Domains

arXiv.org Artificial Intelligence

We take two specific approaches - first Value function approximation in Reinforcement Learning is to represent the lifted Q-value functions and the second (RL) has long been viewed using the lens of feature discovery is to represent the Bellman residuals - both using a set of (Parr et al. 2007). A set of classical approaches relational regression trees (RRTs) (Blockeel and De Raedt for this problem based on Approximate Dynamic Programming 1998). A key aspect of our approach is that it is model-free, (ADP) is the fitted value iteration algorithm (Boyan which most of the RMDP algorithms assume. The only exception and Moore 1995; Ernst, Geurts, and Wehenkel 2005; Riedmiller is Fern et al. (2006), who directly learn in policy 2005), a batch mode approximation scheme that employs space. Our work differs from their work in that we directly function approximators in each iteration to represent learn value functions and eventually policies from them the value estimates. Another popular class of methods that and adapt the most recently successful relational gradientboosting address this problem is Bellman error based methods (Menache, (RFGB) (Natarajan et al. 2014), which has been Mannor, and Shimkin 2005; Keller, Mannor, and Precup shown to outperform learning relational rules one by one.


Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image

arXiv.org Artificial Intelligence

In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g. first-order logic) with continuous motion planning such as nonlinear trajectory optimization. Due to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to find a solution, which limits the scalability of these approaches. To circumvent this combinatorial complexity, we develop a neural network which, based on an initial image of the scene, directly predicts promising discrete action sequences such that ideally only one motion planning problem has to be solved to find a solution to the overall TAMP problem. A key aspect is that our method generalizes to scenes with many and varying number of objects, although being trained on only two objects at a time. This is possible by encoding the objects of the scene in images as input to the neural network, instead of a fixed feature vector. Results show runtime improvements of several magnitudes. Video: https://youtu.be/i8yyEbbvoEk