Markov Models
Budgeted Reinforcement Learning in Continuous State Space
So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.
Fast Bidirectional Probability Estimation in Markov Models
Siddhartha Banerjee, Peter Lofgren
We develop a new bidirectional algorithm for estimating Markov chain multi-step transition probabilities: given a Markov chain, we want to estimate the probability of hitting a given target state in null steps after starting from a given source distribution. Given the target state t, we use a (reverse) local power iteration to construct an'expanded target distribution', which has the same mean as the quantity we want to estimate, but a smaller variance - this can then be sampled efficiently by a Monte Carlo algorithm. Our method extends to any Markov chain on a discrete (finite or countable) state-space, and can be extended to compute functions of multi-step transition probabilities such as PageRank, graph diffusions, hitting/return times, etc. Our main result is that in'sparse' Markov Chains - wherein the number of transitions between states is comparable to the number of states - the running time of our algorithm for a uniform-random target node is order-wise smaller than Monte Carlo and power iteration based algorithms; in particular, our method can estimate a probability p using only O (1/ p) running time.
Adaptive Stochastic Optimization: From Sets to Paths
Zhan Wei Lim, David Hsu, Wee Sun Lee
It plays a crucial role in planning and learning under uncertainty, but is, unfortunately, computationally intractable in general. This paper introduces two conditions on the objective function, the marginal likelihood rate bound and the marginal likelihood bound, which, together with pointwise submodularity, enable efficient approximate solution of ASO. Several interesting classes of functions satisfy these conditions naturally, e.g., the version space reduction function for hypothesis learning. We describe Recursive Adaptive Coverage, a new ASO algorithm that exploits these conditions, and apply the algorithm to two robot planning tasks under uncertainty. In contrast to the earlier submodu-lar optimization approach, our algorithm applies to ASO over both sets and paths .