Goto

Collaborating Authors

 Reinforcement Learning







Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Neural Information Processing Systems

While this is a common approach in supervised learning, to our knowledge, this has not been discussed in detail in the offline RL setting.


A Appendix A.1 Additional Method Justification The key idea of Q

Neural Information Processing Systems

This problem has been studied in stochastic optimal control, particularly REPS [Peters et al., 2010]. In our experiments, we use soft actor-critic [Haarnoja et al., 2018] as our base RL algorithm. The policy and critic networks are MLPs with 2 fully-connected hidden layers of size 256. Following [Sharma et al., 2021b], we use a biased TD update, where For all experiments using prior data collected through RL, the agent was initialized at test time with the pretrained policy and critic. The details for this environment are in [Sharma et al., 2021b].


You Only Live Once: Single-Life Reinforcement Learning Annie S. Chen

Neural Information Processing Systems

For example, imagine a disaster relief robot tasked with retrieving an item from a fallen building, where it cannot get direct supervision from humans. It must retrieve this object within one test-time trial, and must do so while tackling unknown obstacles, though it may leverage knowledge it has of the building before the disaster.



Safe Reinforcement Learning by Imagining the Near Future

Neural Information Processing Systems

In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. In this setting, a model-based agent with a sufficiently accurate model can avoid unsafe states.