Goto

Collaborating Authors

 Markov Models


Accelerated Intravascular Ultrasound Imaging using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

ABSTRACT Intravascular ultrasound (IVUS) offers a unique perspective in the treatment of vascular diseases by creating a sequence of ultrasound-slices acquired from within the vessel. However, unlike conventional hand-held ultrasound, the thin catheter only provides room for a small number of physical channels for signal transfer from a transducer-array at the tip. For continued improvement of image quality and frame rate, we present the use of deep reinforcement learning to deal with the current physical information bottleneck. V aluable inspiration has come from the field of magnetic resonance imaging (MRI), where learned acquisition schemes have brought significant acceleration in image acquisition at competing image quality. To efficiently accelerate IVUS imaging, we propose a framework that utilizes deep reinforcement learning for an optimal adaptive acquisition policy on a per-frame basis enabled by actor-critic methods and Gumbel top-K sampling. Index T erms-- Deep reinforcement learning, intravascu-lar ultrasound, compressed sensing 1. INTRODUCTION Minimally invasive vascular interventions are increasingly guided by phased array intravascular imaging.


Optimal Estimation and Computational Limit of Low-rank Gaussian Mixtures

arXiv.org Machine Learning

Structural matrix-variate observations routinely arise in diverse fields such as multi-layer network analysis and brain image clustering. While data of this type have been extensively investigated with fruitful outcomes being delivered, the fundamental questions like its statistical optimality and computational limit are largely under-explored. In this paper, we propose a low-rank Gaussian mixture model (LrMM) assuming each matrix-valued observation has a planted low-rank structure. Minimax lower bounds for estimating the underlying low-rank matrix are established allowing a whole range of sample sizes and signal strength. Under a minimal condition on signal strength, referred to as the information-theoretical limit or statistical limit, we prove the minimax optimality of a maximum likelihood estimator which, in general, is computationally infeasible. If the signal is stronger than a certain threshold, called the computational limit, we design a computationally fast estimator based on spectral aggregation and demonstrate its minimax optimality. Moreover, when the signal strength is smaller than the computational limit, we provide evidences based on the low-degree likelihood ratio framework to claim that no polynomial-time algorithm can consistently recover the underlying low-rank matrix. Our results reveal multiple phase transitions in the minimax error rates and the statistical-to-computational gap. Numerical experiments confirm our theoretical findings. We further showcase the merit of our spectral aggregation method on the worldwide food trading dataset.


Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

arXiv.org Artificial Intelligence

Recent advances in reinforcement learning (RL) have increased complexity which, especially for deep RL, has brought forth challenges related to explainability [1]. The opacity of state-of-the-art RL algorithms has led to model developers having a limited understanding of their agents' policies and no influence over learned strategies [2]. While concerns surrounding explainability have been noted for AI in general, it is only more recently that attempts have been made to explain RL systems [3, 1, 4, 5]. These attempts have resulted in a wide suite of methods requiring various degrees of expert knowledge, either about the state-action domain or about the specific RL algorithm. Further, they typically rely on post-hoc analysis of learned policies which give only observational assurances of agents' behaviour. We instead propose an intrinsic method of regularizing agents' actions based on a given prior. While current methods for RL regularization aim to improve training performance - e.g., by maximizing the entropy of the action distribution [6], or by minimising the distance to a prior sub-optimal state-action distribution [7] - our aim is the characterization of our agents' behaviours. We also extend the current regularization techniques to accommodate multi-agent systems which allows intrinsic characterization of individual agents. We provide a formal argument for the rationale of our method and demonstrate its efficacy in a toy problem where agents learn to navigate to a destination on a grid by performing, e.g., only right turns (under the premise that right turns are


Under-Approximating Expected Total Rewards in POMDPs

arXiv.org Artificial Intelligence

We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this -- generally undecidable -- problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.


Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

arXiv.org Machine Learning

The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized RL individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry to bias which skills are learnt. While asymmetric choice has a large influence on transferability, prior works have explored a narrow range of asymmetries, primarily motivated by intuition. In this paper, we theoretically and empirically show the crucial trade-off, controlled by information asymmetry, between the expressivity and transferability of skills across sequential tasks. Given this insight, we provide a principled approach towards choosing asymmetry and apply our approach to a complex, robotic block stacking domain, unsolvable by baselines, demonstrating the effectiveness of hierarchical KL-regularized RL, coupled with correct asymmetric choice, for sample-efficient transfer learning.


Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming

arXiv.org Artificial Intelligence

Information sharing is key in building team cognition and enables coordination and cooperation. High-performing human teams also benefit from acting strategically with hierarchical levels of iterated communication and rationalizability, meaning a human agent can reason about the actions of their teammates in their decision-making. Yet, the majority of prior work in Multi-Agent Reinforcement Learning (MARL) does not support iterated rationalizability and only encourage inter-agent communication, resulting in a suboptimal equilibrium cooperation strategy. In this work, we show that reformulating an agent's policy to be conditional on the policies of its neighboring teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG). Building on the idea of decision-making under bounded rationality and cognitive hierarchy theory, we show that our modified PG approach not only maximizes local agent rewards but also implicitly reasons about MI between agents without the need for any explicit ad-hoc regularization terms. Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks. Our experiments validate the utility of InfoPG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains.


Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

arXiv.org Machine Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit{ex post} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.


Optimal variance-reduced stochastic approximation in Banach spaces

arXiv.org Machine Learning

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.


Reinforcement Learning Textbook

arXiv.org Artificial Intelligence

This textbook covers principles behind main modern deep reinforcement learning algorithms that achieved breakthrough results in many domains from game AI to robotics. All required theory is explained with proofs using unified notation and emphasize on the differences between different types of algorithms and the reasons why they are constructed the way they are.


Mixed Nondeterministic-Probabilistic Automata: Blending graphical probabilistic models with nondeterminism

arXiv.org Artificial Intelligence

Graphical models in probability and statistics are a core concept in the area of probabilistic reasoning and probabilistic programming-graphical models include Bayesian networks and factor graphs. In this paper we develop a new model of mixed (nondeterministic/probabilistic) automata that subsumes both nondeterministic automata and graphical probabilistic models. Mixed Automata are equipped with parallel composition, simulation relation, and support message passing algorithms inherited from graphical probabilistic models. Segala's Probabilistic Automatacan be mapped to Mixed Automata.