Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Jin, Chi, Kakade, Sham M., Krishnamurthy, Akshay, Liu, Qinghua

arXiv.org Artificial Intelligence 

In many sequential decision making settings, the agent lacks complete information about the underlying state of the system, a phenomenon known as partial observability. Partial observability significantly complicates the tasks of reinforcement learning and planning, because the non-Markovian nature of the observations forces the agent to maintain memory and reason about beliefs of the system state, all while exploring to collect information about the environment. For example, a robot may not be able to perceive all objects in the environment due to occlusions, and it must reason about how these objects may move to avoid collisions [Cassandra et al., 1996]. Similar reasoning problems arise in imperfect information games [Brown and Sandholm, 2018], medical diagnosis [Hauskrecht and Fraser, 2000], and elsewhere [Rafferty et al., 2011]. Furthermore, from a theoretical perspective, well-known complexity-theoretic results show that learning and planning in partially observable environments is statistically and computationally intractable in general [Papadimitriou and Tsitsiklis, 1987, Mundhenk et al., 2000, Vlassis et al., 2012, Mossel and Roch, 2005]. The standard formulation for reinforcement learning with partial observability is the Partially Observable Markov Decision Process (POMDP), in which an agent operating on noisy observations makes decisions that influence the evolution of a latent state. The complexity barriers apply for this model, but they are of a worst case nature, and they do not preclude efficient algorithms for interesting subclasses of POMDPs. Thus we ask: Can we develop efficient algorithms for reinforcement learning in large classes of POMDPs?

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found