Markov Models
Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning
Long, Pinxin, Fan, Tingxiang, Liao, Xinyi, Liu, Wenxi, Zhang, Hao, Pan, Jia
Developing a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generate its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts. We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to find an optimal policy which is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmaca
Global-Locally Self-Attentive Dialogue State Tracker
Zhong, Victor, Xiong, Caiming, Socher, Richard
Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters between estimators for different types (called slots) of dialogue states, and uses local modules to learn slot-specific features. We show that this significantly improves tracking of rare states and achieves state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% joint goal accuracy and 97.5% request accuracy, outperforming prior work by 1.1% and 1.0%.
Trusted Neural Networks for Safety-Constrained Autonomous Control
Ghosh, Shalini, Mercier, Amaury, Pichapati, Dheeraj, Jha, Susmit, Yegneswaran, Vinod, Lincoln, Patrick
We propose Trusted Neural Network (TNN) models, which are deep neural network models that satisfy safety constraints critical to the application domain. We investigate different mechanisms for incorporating rule-based knowledge in the form of first-order logic constraints into a TNN model, where rules that encode safety are accompanied by weights indicating their relative importance. This framework allows the TNN model to learn from knowledge available in form of data as well as logical rules. We propose multiple approaches for solving this problem: (a) a multi-headed model structure that allows trade-off between satisfying logical constraints and fitting training data in a unified training framework, and (b) creating a constrained optimization problem and solving it in dual formulation by posing a new constrained loss function and using a proximal gradient descent algorithm. We demonstrate the efficacy of our TNN framework through experiments using the open-source TORCS~\cite{BernhardCAA15} 3D simulator for self-driving cars. Experiments using our first approach of a multi-headed TNN model, on a dataset generated by a customized version of TORCS, show that (1) adding safety constraints to a neural network model results in increased performance and safety, and (2) the improvement increases with increasing importance of the safety constraints. Experiments were also performed using the second approach of proximal algorithm for constrained optimization --- they demonstrate how the proposed method ensures that (1) the overall TNN model satisfies the constraints even when the training data violates some of the constraints, and (2) the proximal gradient descent algorithm on the constrained objective converges faster than the unconstrained version.
GumBolt: Extending Gumbel trick to Boltzmann priors
Khoshaman, Amir H., Amin, Mohammad H.
Boltzmann machines (BMs) are appealing candidates for powerful priors in variational autoencoders (VAEs), as they are capable of capturing nontrivial and multi-modal distributions over discrete variables. However, indifferentiability of the discrete units prohibits using the reparameterization trick, essential for low-noise back propagation. The Gumbel trick resolves this problem in a consistent way by relaxing the variables and distributions, but it is incompatible with BM priors. Here, we propose the GumBolt, a model that extends the Gumbel trick to BM priors in VAEs. GumBolt is significantly simpler than the recently proposed methods with BM prior and outperforms them by a considerable margin. It achieves state-of-the-art performance on permutation invariant MNIST and OMNIGLOT datasets in the scope of models with only discrete latent variables. Moreover, the performance can be further improved by allowing multi-sampled (importance-weighted) estimation of log-likelihood in training, which was not possible with previous models.
DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors
Vahdat, Arash, Andriyash, Evgeny, Macready, William G.
Boltzmann machines are powerful distributions that have been shown to be an effective prior over binary latent variables in variational autoencoders (VAEs). However, previous methods for training discrete VAEs have used the evidence lower bound and not the tighter importance-weighted bound. We propose two approaches for relaxing Boltzmann machines to continuous distributions that permit training with importance-weighted bounds. These relaxations are based on generalized overlapping transformations and the Gaussian integral trick. Experiments on the MNIST and OMNIGLOT datasets show that these relaxations outperform previous discrete VAEs with Boltzmann priors.
Accurate Kernel Learning for Linear Gaussian Markov Processes using a Scalable Likelihood Computation
We report an exact likelihood computation for Linear Gaussian Markov processes that is more scalable than existing algorithms for complex models and sparsely sampled signals. Better scaling is achieved through elimination of repeated computations in the Kalman likelihood, and by using the diagonalized form of the state transition equation. Using this efficient computation, we study the accuracy of kernel learning using maximum likelihood and the posterior mean in a simulation experiment. The posterior mean with a reference prior is more accurate for complex models and sparse sampling. Because of its lower computation load, the maximum likelihood estimator is an attractive option for more densely sampled signals and lower order models. We confirm estimator behavior in experimental data through their application to speleothem data.
Multitaper Spectral Estimation HDP-HMMs for EEG Sleep Inference
Chlon, Leon, Song, Andrew, Subramanian, Sandya, Soulat, Hugo, Tauber, John, Ba, Demba, Prerau, Michael
Electroencephalographic (EEG) monitoring of neural activity is widely used for sleep disorder diagnostics and research. The standard of care is to manually classify 30-second epochs of EEG time-domain traces into 5 discrete sleep stages. Unfortunately, this scoring process is subjective and time-consuming, and the defined stages do not capture the heterogeneous landscape of healthy and clinical neural dynamics. This motivates the search for a data-driven and principled way to identify the number and composition of salient, reoccurring brain states present during sleep. To this end, we propose a Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM), combined with wide-sense stationary (WSS) time series spectral estimation to construct a generative model for personalized subject sleep states. In addition, we employ multitaper spectral estimation to further reduce the large variance of the spectral estimates inherent to finite-length EEG measurements. By applying our method to both simulated and human sleep data, we arrive at three main results: 1) a Bayesian nonparametric automated algorithm that recovers general temporal dynamics of sleep, 2) identification of subject-specific "microstates" within canonical sleep stages, and 3) discovery of stage-dependent sub-oscillations with shared spectral signatures across subjects.
Neural User Simulation for Corpus-based Policy Optimisation for Spoken Dialogue Systems
Kreyssig, Florian, Casanueva, Inigo, Budzianowski, Pawel, Gasic, Milica
User Simulators are one of the major tools that enable offline training of task-oriented dialogue systems. For this task the Agenda-Based User Simulator (ABUS) is often used. The ABUS is based on hand-crafted rules and its output is in semantic form. Issues arise from both properties such as limited diversity and the inability to interface a text-level belief tracker. This paper introduces the Neural User Simulator (NUS) whose behaviour is learned from a corpus and which generates natural language, hence needing a less labelled dataset than simulators generating a semantic output. In comparison to much of the past work on this topic, which evaluates user simulators on corpus-based metrics, we use the NUS to train the policy of a reinforcement learning based Spoken Dialogue System. The NUS is compared to the ABUS by evaluating the policies that were trained using the simulators. Cross-model evaluation is performed i.e. training on one simulator and testing on the other. Furthermore, the trained policies are tested on real users. In both evaluation tasks the NUS outperformed the ABUS.
An introduction to Policy Gradients with Cartpole and Doom
In the last two articles about Q-learning and Deep Q learning, we worked with value-based reinforcement learning algorithms. To choose which action to take given a state, we take the action with the highest Q-value (maximum expected future reward I will get at each state). As a consequence, in value-based learning, a policy exists only because of these action-value estimates. Today, we'll learn a policy-based reinforcement learning technique called Policy Gradients. The first will learn to keep the bar in balance.
Stochastic Approximation for Risk-aware Markov Decision Processes
Huang, Wenjie, Haskell, William B.
The analysis of complex systems such as inventory control, financial markets, waste-to-energy plants and computer networks is difficult because of the inherent uncertainties in these systems. Risk-aware optimization offers a possible remedy by giving stronger reliability guarantees than the risk-neutral case. Furthermore, it allows expression of the risk attitude of the decision maker. Risk awareness is especially important in sequential decision making because of the dynamic nature of the uncertainty. Markov decision processes (MDPs) introduced by Bellman in [10] provide a mathematical framework for modeling sequential decision making in situations where outcomes are partly random and partly under the control the decision maker. However, in many cases the exact model of the underlying Markov decision process is not known and one can only observe the trajectory of states, actions, and rewards/costs.