Goto

Collaborating Authors

 Markov Models


Cautious Reinforcement Learning with Logical Constraints

arXiv.org Artificial Intelligence

This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesize optimal control policies while ensuring safety during the learning process. We express the safety requirements as a temporal logic formula. Enforcing the RL agent to stay safe during learning might limit the exploration in some safety-critical cases. However, we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration and ensuring strict safety. Theoretical guarantees are available on the convergence of the algorithm. Finally experimental results are provided to showcase the performance of the proposed method.


Causal Machine Learning Workshop SEW-HSG University of St.Gallen

#artificialintelligence

Program: Monday Session I Maximilian Kasy, "Adaptive treatment assignment in experiments for policy choice" Bezirgen Veliyev, "Functional Sequential Treatment Allocation" Keynote Uri Shalit about "Machine learning and causal inference: a two-way road": "This talk will have two parts. In the first we will discuss a framework we developed for learning individualized treatment recommendations from observational health data, merging ideas from machine learning and causal inference. We will see examples of our framework applied to two crucial health problems using data from tens of thousands of patients, and discuss some important causal-inference challenges that come to focus in this setting. In the second part we will show how we use ideas from the causal inference literature to address long standing problems in machine learning: off-policy evaluation in a partially observable Markov decision process (POMDP), and learning predictive models that are stable against distributional shifts." Heterogeneous effects of training programmes for unemployed in Belgium" Daniel Jacob, "Does Tenure make you love your Job?" Nicolaj Mรผhlbach, "Heterogeneous Treatment Effects of an Early Retirement Reform" Tuesday Session III Dmitry Arkhangelsky, "Double-Robust Identification for Causal Panel Data Models" Martin Spindler, "Uniform Inference in High-Dimensional Gaussian Graphical Models" Keynote Stefan Wager about "Designing Loss Functions for Causal Machine Learning": "Given advances in machine learning over the past decades, it is now possible to accurately solve difficult non-parametric prediction problems in a way that is routine and reproducible.


Markov Logic Networks with Complex Weights: Expressivity, Liftability and Fourier Transforms

arXiv.org Artificial Intelligence

Statistical Relational Learning [Getoor and Taskar, 2007] (SRL) is concerned with learning probabilistic models from relational data such as, for instance, knowledge graphs, biological or social networks, structures of molecules etc. Markov Logic Networks [Richardson and Domingos, 2006] (MLNs) are among the most prominent SRL systems and in this paper we are interested in their expressivity. Informally, expressivity measures the "amount" of distributions that can be modelled by a given class of probabilistic models. An MLN is given by a set of weighted first-order logic formulas and it defines a distribution on possible worlds over a given domain. Here we study expressivity of MLNs in a setting where we first fix the first-order logic formulas defining the MLN and then vary their weights. Since it is not even clear what expressivity should mean in this context, our first contribution in this paper is a formal framework for studying expressivity of MLNs. The main reason for studying expressivity of MLNs in the setting where one first fixes the formulas is computational complexity of inference because its complexity usually depends mostly on the formulas and not so much on their weights.


TensorLog: A Probabilistic Database Implemented Using Deep-Learning Infrastructure

Journal of Artificial Intelligence Research

We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. The integration with these frameworks enables use of GPU-based parallel processors for inference and learning, making TensorLog the first highly parallellizable probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples.


Generalized Bayesian Filtering via Sequential Monte Carlo

arXiv.org Machine Learning

We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of generalized Bayesian inference (GBI) to define generalized filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination through the $\beta$-divergence. Operationalizing the proposed framework is made possible via sequential Monte Carlo methods (SMC). The standard particle methods, and their associated convergence results, are readily generalized to the new setting. We demonstrate our approach to object tracking and Gaussian process regression problems, and observe improved performance over standard filtering algorithms.


Discriminative Particle Filter Reinforcement Learning for Complex Partial Observations

arXiv.org Artificial Intelligence

Deep reinforcement learning is successful in decision making for sophisticated games, such as Atari, Go, etc. However, real-world decision making often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter in the neural network policy for explicit reasoning with partial observations over time. The particle filter maintains a belief using learned discriminative update, which is trained end-to-end for decision making. We show that using the discriminative update instead of standard generative models results in significantly improved performance, especially for tasks with complex visual observations, because they circumvent the difficulty of modeling complex observations that are irrelevant to decision making. In addition, to extract features from the particle belief, we propose a new type of belief feature based on the moment generating function. DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this paper. Further, DPFRL performs well for visual navigation with real-world data in the Habitat environment.


Periodic Q-Learning

arXiv.org Machine Learning

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.


Stochastic Gradient MCMC with Repulsive Forces

arXiv.org Machine Learning

We propose a unifying view of two different Bayesian inference algorithms, Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) and Stein Variational Gradient Descent (SVGD), leading to improved and efficient novel sampling schemes. We show that SVGD combined with a noise term can be framed as a multiple chain SG-MCMC method. Instead of treating each parallel chain independently from others, our proposed algorithm implements a repulsive force between particles, avoiding collapse and facilitating a better exploration of the parameter space. We also show how the addition of this noise term is necessary to obtain a valid SG-MCMC sampler, a significant difference with SVGD. Experiments with both synthetic distributions and real datasets illustrate the benefits of the proposed scheme.


Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach

arXiv.org Machine Learning

In this paper, we design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed to improve the data freshness and connectivity to the Internet of Things (IoT) devices. First, we formulate an energy-efficient trajectory optimization problem in which the objective is to maximize the energy efficiency by optimizing the UAV-BS trajectory policy. We also incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS. Second, we propose an agile deep reinforcement learning with experience replay model to solve the formulated problem concerning the contextual constraints for the UAV-BS navigation. Moreover, the proposed approach is well-suited for solving the problem, since the state space of the problem is extremely large and finding the best trajectory policy with useful contextual features is too complex for the UAV-BSs. By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time. Finally, the simulation results illustrate the proposed approach is 3.6% and 3.13% more energy efficient than those of the greedy and baseline deep Q Network (DQN) approaches.


Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

arXiv.org Machine Learning

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history generated by unknown behavioral policies. We study a regression-based fitted Q iteration method, and show that it is equivalent to a model-based method that estimates a conditional mean embedding of the transition operator. We prove that this method is information-theoretically optimal and has nearly minimal estimation error. In particular, by leveraging contraction property of Markov processes and martingale concentration, we establish a finite-sample instance-dependent error upper bound and a nearly-matching minimax lower bound. The policy evaluation error depends sharply on a restricted $\chi^2$-divergence over the function class between the long-term distribution of the target policy and the distribution of past data. This restricted $\chi^2$-divergence is both instance-dependent and function-class-dependent. It characterizes the statistical limit of off-policy evaluation. Further, we provide an easily computable confidence bound for the policy evaluator, which may be useful for optimistic planning and safe policy improvement.