Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement Benjamin Eysenbach Sergey Levine ψθ

Neural Information Processing Systems 

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal?