Goto

Collaborating Authors

 value function distribution


Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

Singh, Jaskirat, Zheng, Liang

arXiv.org Artificial Intelligence

Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications aiming to achieve generalization and domain transfer from simulation to the real world [1,2]. While such a strategy is helpful with generalization, the use of multiple scenes significantly increases the variance of samples collected for policy gradient computations. Current methods continue to view this collection of scenes as a single Markov Decision Process (MDP) with a common value function; however, we argue that it is better to treat the collection as a single environment with multiple underlying MDPs. To this end, we propose a dynamic value estimation (DVE) technique for these multiple-MDP environments, motivated by the clustering effect observed in the value function distribution across different scenes. The resulting agent is able to learn a more accurate and scene-specific value function estimate (and hence the advantage function), leading to a lower sample variance. Our proposed approach is simple to accommodate with several existing implementations (like PPO, A3C) and results in consistent improvements for a range of ProcGen environments and the AI2-THOR framework based visual navigation task.


Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

Dimitrakakis, Christos, Eriksson, Hannes, Jorge, Emilio, Grover, Divya, Basu, Debabrota

arXiv.org Machine Learning

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution to the problem of reinforcement learning. However, typical model-based BRL algorithms have focused either on ma intaining a posterior distribution on models or value functions and combining this with approx imate dynamic programming or tree search. This paper describes a novel backwards induction pri nciple for performing joint Bayesian estimation of models and value functions, from which many new BRL algorithms can be obtained. We demonstrate this idea with algorithms and experiments in discrete state spaces.