Undirected Networks
Stick-Breaking Policy Learning in Dec-POMDPs
Liu, Miao (Massachusetts Institute of Technology) | Amato, Christopher (University of New Hampshire) | Liao, Xuejun (Duke University) | Carin, Lawrence (Duke University) | How, Jonathan P. (Massachusetts Institute of Technology)
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Unsupervised Machine Condition Monitoring Using Segmental Hidden Markov Models
The task of machine condition monitoring is to detect machine failures at an early stage such that maintenance can be carried out in a timely manner. Most existing techniques are supervised approaches: they require user annotated training data to learn normal and faulty behaviors of a machine. However, such supervision can be difficult to acquire. In contrast, unsupervised methods don't need much human involvement, however, they face another challenge: how to model the generative (observation) process of sensor signals. We propose an unsupervised approach based on segmental hidden Markov models. Our method has a unifying observation model integrating three pieces of information that are complementary to each other. First, we model the signal as an explicit function over time, which describes its possible non-stationary trending patterns. Second, the stationary part of the signal is fit by an autoregressive model. Third, we introduce contextual information to break down the signal complexity such that the signal is modeled separately under different conditions. The advantages of the proposed model are demonstrated by tests on gas turbine, truck and honeybee datasets.
Increasingly Cautious Optimism for Practical PAC-MDP Exploration
Zhang, Liangpeng (University of Science and Technology of China) | Tang, Ke (University of Science and Technology of China) | Yao, Xin (University of Birmingham)
Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PACMDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.
Learning Behaviors in Agents Systems with Interactive Dynamic Influence Diagrams
Conroy, Ross (Teesside University) | Zeng, Yifeng (Teesside University) | Cavazza, Marc (Teesside University) | Chen, Yingke (University of Georgia)
Interactive dynamic influence diagrams(I-DIDs) are a well recognized decision model that explicitly considers how multiagent interaction affects individual decision making. To predict behavior of other agents, I-DIDs require models of the other agents to be known ahead of time and manually encoded. This becomes a barrier to I-DID applications in a human-agent interaction setting, such as development of intelligent non-player characters(NPCs) in real-time strategy(RTS) games, where models of other agents or human players are often inaccessible to domain experts. In this paper, we use automatic techniques for learning behavior of other agents from replay data in RTS games. We propose a learning algorithm with improvement over existing work by building a full profile of agent behavior. This is the first time that data-driven learning techniques are embedded into the I-DID decision making framework. We evaluate the performance of our approach on two test cases.
Dynamic Execution of Temporal Plans with Sensing Actions and Bounded Risk
Santana, Pedro Henrique (Massachusetts Institute of Technology) | Williams, Brian C. (Massachusetts Institute of Technology)
This thesis focuses on the problem of temporal planning under uncertainty with explicit safety guarantees, which are enforced by means of chance constraints. We aim at elevating the level in which operators interact with autonomous agents and specify their desired behavior, while retaining a keen sensitivity to risk. Instead of relying on unconditional sequences, our goal is to allow contingent plans to be dynamically scheduled and conditioned on observations of the world while remaining safe. Contingencies add flexibility by allowing goals to be achieved through different methods, while observations allow the agent to adapt to the environment. We demonstrate the usefulness of our chance-constrained temporal planning approaches in real-world applications, such as partially observable power supply restoration and collaborative human-robot manufacturing.
Efficient Methods for Multi-Objective Decision-Theoretic Planning
Roijers, Diederik Marijn (University of Amsterdam)
In decision-theoretic planning problems, such as (partially observable) Markov decision problems or coordination graphs, agents typically aim to optimize a scalar value function. However, in many real-world problems agents are faced with multiple possibly conflicting objectives. In such multi-objective problems, the value is a vector rather than a scalar, and we need methods that compute a coverage set, i.e., a set of solutions optimal for all possible trade-offs between the objectives. In this project propose new multi-objective planning methods that compute the so-called convex coverage set (CCS): the coverage set for when policies can be stochastic, or the preferences are linear. We show that the CCS has favorable mathematical properties, and is typically much easier to compute that the Pareto front, which is often axiomatically assumed as the solution set for multi-objective decision problems.
Exploiting Separability in Multiagent Planning with Continuous-State MDPs (Extended Abstract)
Dibangoye, Jilles Steeve (Inria - CITI and INSA - Universitรฉ de Lyon) | Amato, Christopher (University of New Hampshire) | Buffet, Olivier (Inria) | Charpillet, Franรงois (Inria - LORIA)
Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in cooperative decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we recently introduced a method for transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. However, scalability remains limited when the number of agents or problem variables becomes large. In this paper, we show that, under certain separability conditions of the optimal value function, the scalability of this approach can increase considerably. This separability is present when there is locality of interaction between agents, which can be exploited to improve performance. Unlike most previous methods, the novel continuous-state MDP algorithm retains optimality and convergence guarantees. Results show that the extension using separability can scale to a large number of agents and domain variables while maintaining optimality.
Inapproximability of Treewidth and Related Problems (Extended Abstract)
Wu, Yu (Facebook AI Research Lab) | Austrin, Per (KTH Royal Insititute of Technology) | Pitassi, Toniann (University of Toronto) | Liu, David (University of Toronto)
Graphical models, such as Bayesian Networks and Markov networks play an important role in artificial intelligence and machine learning. Inference is a central problem to be solved on these networks. This, and other problems on these graph models are often known to be hard to solve in general, but tractable on graphs with bounded Treewidth. Therefore, finding or approximating the Treewidth of a graph is a fundamental problem related to inference in graphical models. In this paper, we study the approximability of a number of graph problems: Treewidth and Pathwidth of graphs, Minimum Fill-In, and a variety of different graph layout problems such as Minimum Cut Linear Arrangement. We show that, assuming Small Set Expansion Conjecture, all of these problems are NP-hard to approx- imate to within any constant factor in polynomial time.
Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation
Lyu, Qi (Tsinghua University) | Wu, Zhiyong (Tsinghua University) | Zhu, Jun (Tsinghua University) | Meng, Helen (The Chinese University of Hong Kong)
We propose an automatic music generation demo based on artificial neural networks, which integrates the ability of Long Short-Term Memory (LSTM) in memorizing and retrieving useful history information, together with the advantage of Restricted Boltzmann Machine (RBM) in high dimensional data modelling. Our model can generalize to different musical styles and generate polyphonic music better than previous models.
Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes
Topin, Nicholay (University of Maryland, Baltimore County) | Haltmeyer, Nicholas (University of Maryland, Baltimore County) | Squire, Shawn (University of Maryland, Baltimore County) | Winder, John (University of Maryland, Baltimore County) | desJardins, Marie (University of Maryland, Baltimore County) | MacGlashan, James (Brown University)
We introduce a novel framework for option discovery and learning transfer in complex domains that are represented as object-oriented Markov decision processes (OO-MDPs) [Diuk et al., 2008]. Our framework, Portable Option Discovery (POD), extends existing option discovery methods, and enables transfer across related but different domains by providing an unsupervised method for finding a mapping between object-oriented domains with different state spaces. The framework also includes heuristic approaches for increasing the efficiency of the mapping process. We present the results of applying POD to Pickett and Barto's [2002] PolicyBlocks and MacGlashan's [2013] Option-Based Policy Transfer in two application domains. We show that our approach can discover options effectively, transfer options among different domains, and improve learning performance with low computational overhead.