DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
Shrestha, Aayam, Lee, Stefan, Tadepalli, Prasad, Fern, Alan
–arXiv.org Artificial Intelligence
We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjustment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a nonparametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with imagebased observations. Overall, the experiments demonstrate that the framework can work in practice and scale to large complex offline RL problems. Research in automated planning and control has produced powerful algorithms to solve for optimal, or near-optimal, decisions given accurate environment models. Examples include the classic valueand policy-iteration algorithms for tabular representations or more sophisticated symbolic variants for graphical model representations (e.g. In concept, these planners address many of the traditional challenges in reinforcement learning (RL). They can perform "zero-shot transfer" to new goals and changes to the environment model, accurately account for sparse reward or low-probability events, and solve for different optimization objectives (e.g. Effectively leveraging these planners, however, requires an accurate model grounded in observations and expressed in the planner's representation. On the other hand, model-based reinforcement learning (MBRL) aims to learn grounded models to improve RL's data efficiency.
arXiv.org Artificial Intelligence
Oct-17-2020
- Country:
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- Genre:
- Research Report (0.82)
- Technology: