AITopics

Grover, Aditya, Al-Shedivat, Maruan, Gupta, Jayesh K., Burda, Yura, Edwards, Harrison

Learning Policy Representations in Multiagent Systems

arXiv.org Artificial IntelligenceJun-17-2018

Modeling agent behavior is central to understanding the emergence of complex phenomena in multiagent systems. Prior work in agent modeling has largely been task-specific and driven by hand-engineering domain-specific prior knowledge. We propose a general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data. Our framework casts agent modeling as a representation learning problem. Consequently, we construct a novel objective inspired by imitation learning and agent identification and design an algorithm for unsupervised learning of representations of agent policies. We demonstrate empirically the utility of the proposed framework in (i) a challenging high-dimensional competitive environment for continuous control and (ii) a cooperative environment for communication, on supervised predictive tasks, unsupervised clustering, and policy optimization using deep reinforcement learning.

artificial intelligence, deep learning, machine learning, (17 more...)

1806.06464

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.64)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Combes, Richard, Touati, Mikael

Computationally Efficient Estimation of the Spectral Gap of a Markov Chain

arXiv.org Machine LearningJun-15-2018

We consider the problem of estimating from sample paths the absolute spectral gap $\gamma_*$ of a reversible, irreducible and aperiodic Markov chain $(X_t)_{t \in \mathbb{N}}$ over a finite state $\Omega$. We propose the ${\tt UCPI}$ (Upper Confidence Power Iteration) algorithm for this problem, a low-complexity algorithm which estimates the spectral gap in time ${\cal O}(n)$ and memory space ${\cal O}((\ln n)^2)$ given $n$ samples. This is in stark contrast with most known methods which require at least memory space ${\cal O}(|\Omega|)$, so that they cannot be applied to large state spaces. Furthermore, ${\tt UCPI}$ is amenable to parallel implementation.

artificial intelligence, machine learning, sample path, (16 more...)

1806.06047

Country:

Europe > France (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > Hungary > Budapest > Budapest (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Papini, Matteo, Binaghi, Damiano, Canonaco, Giuseppe, Pirotta, Matteo, Restelli, Marcello

Stochastic Variance-Reduced Policy Gradient

arXiv.org Machine LearningJun-14-2018

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

machine learning, reinforcement learning, variance, (16 more...)

1806.05618

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.63)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Katt, Sammie, Oliehoek, Frans A., Amato, Christopher

Learning in POMDPs with Monte Carlo Tree Search

arXiv.org Artificial IntelligenceJun-14-2018

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

artificial intelligence, machine learning, simulation, (15 more...)

1806.05631

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Metelli, Alberto Maria, Mutti, Mirco, Restelli, Marcello

Configurable Markov Decision Processes

arXiv.org Artificial IntelligenceJun-14-2018

In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.

artificial intelligence, machine learning, optimization problem, (13 more...)

1806.05415

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Iowa (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Majumdar, Anirudha, Goldstein, Maxwell

PAC-Bayes Control: Synthesizing Controllers that Provably Generalize to Novel Environments

arXiv.org Artificial IntelligenceJun-14-2018

Our goal is to synthesize controllers for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between robustness of controllers to novel environments and generalization of hypotheses in supervised learning. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds (that hold with high probability) on the expected cost of (stochastic) controllers across novel environments. We propose control synthesis algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming in particular) in the setting where we are optimizing over a finite control policy space. In the more general setting of continuously parameterized controllers, we minimize this upper bound using stochastic gradient descent. We present examples of our approach in the context of obstacle avoidance control with depth measurements. Our simulated examples demonstrate the potential of our approach to provide strong generalization guarantees on controllers for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, and rich sensory inputs (e.g., depth measurements).

artificial intelligence, controller, machine learning, (16 more...)

1806.04225

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Aerospace & Defense (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
(4 more...)

arXiv.org Machine LearningJun-13-2018

Path-entropy maximized Markov chains for dimensionality reduction

Dixit, Purushottam D.

Stochastic kernel based dimensionality reduction methods have become popular in the last decade. The central component of these methods is a symmetric kernel that quantifies the vicinity of pairs of data points and a kernel-induced Markov chain. Typically, the Markov chain is fully specified by the kernel through row normalization. However, it may be desirable to impose user-specified stationary-state and dynamical constraints on the Markov chain. Notably, no systematic framework exists to prescribe user-defined constraints on Markov chains. Here, we use a path entropy maximization based approach to derive Markov chains on data using a kernel and additional user-defined constraints. We illustrate the usefulness of the path entropy normalization procedure with multiple real and artificial data sets. All scripts are available at: https://github.com/dixitpd/maxcaldiffmap

artificial intelligence, machine learning, markov chain, (16 more...)

1806.05096

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Gong, Wenbo, Li, Yingzhen, Hernández-Lobato, José Miguel

Meta-Learning for Stochastic Gradient MCMC

arXiv.org Machine LearningJun-12-2018

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has become increasingly popular for simulating posterior samples in large-scale Bayesian modeling. However, existing SG-MCMC schemes are not tailored to any specific probabilistic model, even a simple modification of the underlying dynamical system requires significant physical intuition. This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. The learned sampler generalizes Hamiltonian dynamics with state-dependent drift and diffusion, enabling fast traversal and efficient exploration of neural network energy landscapes. Experiments validate the proposed approach on both Bayesian fully connected neural network and Bayesian recurrent neural network tasks, showing that the learned sampler out-performs generic, hand-designed SG-MCMC algorithms, and generalizes to different datasets and larger architectures.

artificial intelligence, machine learning, sampler, (15 more...)

1806.04522

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
(2 more...)

Casgrain, Philippe, Jaimungal, Sebastian

Trading algorithms with learning in latent alpha models

arXiv.org Machine LearningJun-12-2018

Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader's actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1806.04472

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)