Undirected Networks
Provably Efficient Model-Free Algorithms for Non-stationary CMDPs
Wei, Honghao, Ghosh, Arnob, Shroff, Ness, Ying, Lei, Zhou, Xingyu
We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, reward, utility functions, and transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.
Product Jacobi-Theta Boltzmann machines with score matching
Pasquale, Andrea, Krefl, Daniel, Carrazza, Stefano, Nielsen, Frank
The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques. Successful applications can be obtained using models inspired by the Boltzmann machine (BM) architecture. In this manuscript, the product Jacobi-Theta Boltzmann machine (pJTBM) is introduced as a restricted version of the Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection matrix. We show that score matching, based on the Fisher divergence, can be used to fit probability densities with the pJTBM more efficiently than with the original RTBM.
Decision-Making Under Uncertainty: Beyond Probabilities
Badings, Thom, Simão, Thiago D., Suilen, Marnix, Jansen, Nils
This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.
How Deep Q Network works part1(Machine Learning)
Abstract: Reinforcement Learning (RL) is currently one of the most commonly used techniques for traffic signal control (TSC), which can adaptively adjusted traffic signal phase and duration according to real-time traffic data. However, a fully centralized RL approach is beset with difficulties in a multi-network scenario because of exponential growth in state-action space with increasing intersections. Multi-agent reinforcement learning (MARL) can overcome the high-dimension problem by employing the global control of each local RL agent, but it also brings new challenges, such as the failure of convergence caused by the non-stationary Markov Decision Process (MDP). In this paper, we introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches. The OPNDQN algorithm solves the problem that traditional algorithms cannot be used in large state-action space traffic models by utilizing a fictitious game approach at each iteration to find the nash equilibrium among neighboring intersections, from which no intersection has incentive to unilaterally deviate.
Efficient Recovery Learning using Model Predictive Meta-Reasoning
Vats, Shivam, Likhachev, Maxim, Kroemer, Oliver
Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm called Meta-Reasoning for Skill Learning (MetaReSkill) that monitors the progress of all recovery policies during training and allocates training resources to recoveries that are likely to improve the task performance the most. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71% to 92.4% in simulation and from 75% to 90% on a real robot.
Combining Contention-Based Spectrum Access and Adaptive Modulation using Deep Reinforcement Learning
Doshi, Akash, Andrews, Jeffrey G.
The use of unlicensed spectrum for cellular systems to mitigate spectrum scarcity has led to the development of intelligent adaptive approaches to spectrum access that improve upon traditional carrier sensing and listen-before-talk methods. We study decentralized contention-based medium access for base stations (BSs) of a single Radio Access Technology (RAT) operating on unlicensed shared spectrum. We devise a distributed deep reinforcement learning-based algorithm for both contention and adaptive modulation, modelled on a two state Markov decision process, that attempts to maximize a network-wide downlink throughput objective. Empirically, we find the (proportional fairness) reward accumulated by a policy gradient approach to be significantly higher than even a genie-aided adaptive energy detection threshold. Our approaches are further validated by improved sum and peak throughput. The scalability of our approach to large networks is demonstrated via an improved cumulative reward earned on both indoor and outdoor layouts with a large number of BSs.
Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer
Acciaio, Beatrice, Kratsios, Anastasis, Pammer, Gudmund
Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which the user specifies a suitable pair of metric spaces $\mathscr{X}$ and $\mathscr{Y}$ and our framework returns a DL model capable of causally approximating any ``regular'' map sending time series in $\mathscr{X}^{\mathbb{Z}}$ to time series in $\mathscr{Y}^{\mathbb{Z}}$ while respecting their forward flow of information throughout time. Suitable geometries on $\mathscr{Y}$ include various (adapted) Wasserstein spaces arising in optimal stopping problems, a variety of statistical manifolds describing the conditional distribution of continuous-time finite state Markov chains, and all Fr\'{e}chet spaces admitting a Schauder basis, e.g. as in classical finance. Suitable spaces $\mathscr{X}$ are compact subsets of any Euclidean space. Our results all quantitatively express the number of parameters needed for our DL model to achieve a given approximation error as a function of the target map's regularity and the geometric structure both of $\mathscr{X}$ and of $\mathscr{Y}$. Even when omitting any temporal structure, our universal approximation theorems are the first guarantees that H\"{o}lder functions, defined between such $\mathscr{X}$ and $\mathscr{Y}$ can be approximated by DL models.
Learned Parameter Selection for Robotic Information Gathering
Denniston, Christopher E., Salhotra, Gautam, Kangaslahti, Akseli, Caron, David A., Sukhatme, Gaurav S.
When robots are deployed in the field for environmental monitoring they typically execute pre-programmed motions, such as lawnmower paths, instead of adaptive methods, such as informative path planning. One reason for this is that adaptive methods are dependent on parameter choices that are both critical to set correctly and difficult for the non-specialist to choose. Here, we show how to automatically configure a planner for informative path planning by training a reinforcement learning agent to select planner parameters at each iteration of informative path planning. We demonstrate our method with 37 instances of 3 distinct environments, and compare it against pure (end-to-end) reinforcement learning techniques, as well as approaches that do not use a learned model to change the planner parameters. Our method shows a 9.53% mean improvement in the cumulative reward across diverse environments when compared to end-to-end learning based methods; we also demonstrate via a field experiment how it can be readily used to facilitate high performance deployment of an information gathering robot.
Formally Verified Solution Methods for Infinite-Horizon Markov Decision Processes
Schäfeller, Maximilian, Abdulaziz, Mohammad
We formally verify executable algorithms for solving Markov decision processes (MDPs) in the interactive theorem prover Isabelle/HOL. We build on existing formalizations of probability theory to analyze the expected total reward criterion on infinite-horizon problems. Our developments formalize the Bellman equation and give conditions under which optimal policies exist. Based on this analysis, we verify dynamic programming algorithms to solve tabular MDPs. We evaluate the formally verified implementations experimentally on standard problems and show they are practical. Furthermore, we show that, combined with efficient unverified implementations, our system can compete with and even outperform state-of-the-art systems.
Disentangling representations in Restricted Boltzmann Machines without adversaries
Fernandez-de-Cossio-Diaz, Jorge, Cocco, Simona, Monasson, Remi
A goal of unsupervised machine learning is to build representations of complex high-dimensional data, with simple relations to their properties. Such disentangled representations make easier to interpret the significant latent factors of variation in the data, as well as to generate new data with desirable features. Methods for disentangling representations often rely on an adversarial scheme, in which representations are tuned to avoid discriminators from being able to reconstruct information about the data properties (labels). Unfortunately adversarial training is generally difficult to implement in practice. Here we propose a simple, effective way of disentangling representations without any need to train adversarial discriminators, and apply our approach to Restricted Boltzmann Machines (RBM), one of the simplest representation-based generative models. Our approach relies on the introduction of adequate constraints on the weights during training, which allows us to concentrate information about labels on a small subset of latent variables. The effectiveness of the approach is illustrated with four examples: the CelebA dataset of facial images, the two-dimensional Ising model, the MNIST dataset of handwritten digits, and the taxonomy of protein families. In addition, we show how our framework allows for analytically computing the cost, in terms of log-likelihood of the data, associated to the disentanglement of their representations.