Undirected Networks
An Introduction to Probabilistic Programming
van de Meent, Jan-Willem, Paige, Brooks, Yang, Hongseok, Wood, Frank
This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages. We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs. In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller. This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.
Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning
Sadeghianpourhamami, Nasrin, Deleu, Johannes, Develder, Chris
Initial DR studies mainly adopt model predictive control and thus require accurate models of the control problem (e.g., a customer behavior model), which are to a large extent uncertain for the EV scenario. Hence, model-free approaches, especially based on reinforcement learning (RL) are an attractive alternative. In this paper, we propose a new Markov decision process (MDP) formulation in the RL framework, to jointly coordinate a set of EV charging stations. State-of-the-art algorithms either focus on a single EV, or perform the control of an aggregate of EVs in multiple steps (e.g., aggregate load decisions in one step, then a step translating the aggregate decision to individual connected EVs). On the contrary, we propose an RL approach to jointly control the whole set of EVs at once. We contribute a new MDP formulation, with a scalable state representation that is independent of the number of EV charging stations. Further, we use a batch reinforcement learning algorithm, i.e., an instance of fitted Q-iteration, to learn the optimal charging policy. We analyze its performance using simulation experiments based on a real-world EV charging data. More specifically, we (i) explore the various settings in training the RL policy (e.g., duration of the period with training data), (ii) compare its performance to an oracle all-knowing benchmark (which provides an upper bound for performance, relying on information that is not available or at least imperfect in practice), (iii) analyze performance over time, over the course of a full year to evaluate possible performance fluctuations (e.g, across different seasons), and (iv) demonstrate the generalization capacity of a learned control policy to larger sets of charging stations.
Omega-Regular Objectives in Model-Free Reinforcement Learning
Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven, Somenzi, Fabio, Trivedi, Ashutosh, Wojtczak, Dominik
We provide the first solution for model-free reinforcement learning of {\omega}-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of {\omega}-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of {\omega}-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.
Learning Navigation Behaviors End to End
Chiang, Hao-Tien Lewis, Faust, Aleksandra, Fiser, Marek, Francis, Anthony
A longstanding goal of behavior-based robotics is to solve high-level navigation tasks using end to end navigation behaviors that directly map sensors to actions. Navigation behaviors, such as reaching a goal or following a path without collisions, can be learned from exploration and interaction with the environment, but are constrained by the type and quality of a robot's sensors, dynamics, and actuators. Traditional motion planning handles varied robot geometry and dynamics, but typically assumes high-quality observations. Modern vision-based navigation typically considers imperfect or partial observations, but simplifies the robot action space. With both approaches, the transition from simulation to reality can be difficult. Here, we learn two end to end navigation behaviors that avoid moving obstacles: point to point and path following. These policies receive noisy lidar observations and output robot linear and angular velocities. We train these policies in small, static environments with Shaped-DDPG, an adaptation of the Deep Deterministic Policy Gradient (DDPG) reinforcement learning method which optimizes reward and network architecture. Over 500 meters of on-robot experiments show , these policies generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. The path following and point and point policies are 83% and 56% more successful than the baseline, respectively.
Supervised Neural Models Revitalize the Open Relation Extraction
Jia, Shengbin, Xiang, Yang, Chen, Xiaojun
Open relation extraction (ORE) remains a challenge to obtain a semantic representation by discovering arbitrary relation tuples from the un-structured text. However, perhaps due to limited data, previous extractors use unsupervised or semi-supervised methods based on pattern matching, which heavily depend on manual work or syntactic parsers and are inefficient or error-cascading. Their development has encountered bottlenecks. Although a few people try to use neural network based models to improve the ORE task performance recently, it is always intractable for ORE to produce supervised systems based on various neural architectures. We analyze and review the neural ORE methods. Further, we construct a large-scale automatically tagging training set and design a tagging scheme to frame ORE as a supervised sequence tagging task. A hybrid neural sequence tagging model (NST) is proposed which combines BiLSTM, CNN and CRF to capture the contextual temporal information, local spatial information, and sentence level tag information of the sequence by using the word and part-of-speech embeddings. Experiments on multiple datasets show that our method is better than most of the existing pattern-based methods and other neural networks based models.
Text Summarization as Tree Transduction by Top-Down TreeLSTM
Bacciu, Davide, Bruno, Antonio
Extractive compression is a challenging natural language processing problem. This work contributes by formulating neural extractive compression as a parse tree transduction problem, rather than a sequence transduction task. Motivated by this, we introduce a deep neural model for learning structure-to-substructure tree transductions by extending the standard Long Short-Term Memory, considering the parent-child relationships in the structural recursion. The proposed model can achieve state of the art performance on sentence compression benchmarks, both in terms of accuracy and compression rate.
Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process
Yoon, Hyung-Jin, Lee, Donghwan, Hovakimyan, Naira
Abstract-- The objective is to study an online Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.
Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning
Agarwal, Akshat, Kumar, Abhinau V, Dunovan, Kyle, Peterson, Erik, Verstynen, Timothy, Sycara, Katia
In the real world, agents often have to operate in situations with incomplete information, limited sensing capabilities, and inherently stochastic environments, making individual observations incomplete and unreliable. Moreover, in many situations it is preferable to delay a decision rather than run the risk of making a bad decision. In such situations it is necessary to aggregate information before taking an action; however, most state of the art reinforcement learning (RL) algorithms are biased towards taking actions \textit{at every time step}, even if the agent is not particularly confident in its chosen action. This lack of caution can lead the agent to make critical mistakes, regardless of prior experience and acclimation to the environment. Motivated by theories of dynamic resolution of uncertainty during decision making in biological brains, we propose a simple accumulator module which accumulates evidence in favor of each possible decision, encodes uncertainty as a dynamic competition between actions, and acts on the environment only when it is sufficiently confident in the chosen action. The agent makes no decision by default, and the burden of proof to make a decision falls on the policy to accrue evidence strongly in favor of a single decision. Our results show that this accumulator module achieves near-optimal performance on a simple guessing game, far outperforming deep recurrent networks using traditional, forced action selection policies.
Streaming dynamic and distributed inference of latent geometric structures
Yurochkin, Mikhail, Fan, Zhiwei, Guha, Aritra, Koutris, Paraschos, Nguyen, XuanLong
The topic or population polytope (Nguyen, 2015; Tang et al., 2014) is a fundamental geometric object that underlies the presence of latent topic variables in topic and admixture models (Blei et al., 2003; Pritchard et al., 2000). When data and the associated topics are indexed by time dimension, it is of interest to study the temporal dynamics of such latent geometric structures. In this paper, we will study the modeling and algorithms for learning the temporal dynamics of topic polytope that arises in the analysis of text corpora. The convex geometry of topic models provides the theoretical basis for posterior contraction analysis of latent topics (Nguyen, 2015; Tang et al., 2014). Furthermore, Yurochkin & Nguyen (2016); Yurochkin et al. (2017) exploited convex geometry to develop fast and quite accurate inference algorithms in a number of parametric and nonparametric settings.
Medical Knowledge Embedding Based on Recursive Neural Network for Multi-Disease Diagnosis
Jiang, Jingchi, Wang, Huanzheng, Xie, Jing, Guo, Xitong, Guan, Yi, Yu, Qiubin
The representation of knowledge based on first-order logic captures the richness of natural language and supports multiple probabilistic inference models. Although symbolic representation enables quantitative reasoning with statistical probability, it is difficult to utilize with machine learning models as they perform numerical operations. In contrast, knowledge embedding (i.e., high-dimensional and continuous vectors) is a feasible approach to complex reasoning that can not only retain the semantic information of knowledge but also establish the quantifiable relationship among them. In this paper, we propose recursive neural knowledge network (RNKN), which combines medical knowledge based on first-order logic with recursive neural network for multi-disease diagnosis. After RNKN is efficiently trained from manually annotated Chinese Electronic Medical Records (CEMRs), diagnosis-oriented knowledge embeddings and weight matrixes are learned. Experimental results verify that the diagnostic accuracy of RNKN is superior to that of some classical machine learning models and Markov logic network (MLN). The results also demonstrate that the more explicit the evidence extracted from CEMRs is, the better is the performance achieved. RNKN gradually exhibits the interpretation of knowledge embeddings as the number of training epochs increases.