Reviews: Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Neural Information Processing Systems 

The paper proposes a method for improving convergence rates of RL algorithms when one has access to a set of state-only expert demonstrations. The method works by modifying the given MDP so that the episode terminates whenever the agent leaves the set of states that had high-probability under the expert demonstrations. The paper then proves an upper bound on the regret incurred using their algorithm (as compared to the expert) in terms of the regret for the RL algorithm that is used to solve the modified MDP. The paper presents a set of experiments showing that the proposed mechanism can effectively strike a tradeoff between convergence rate and optimality. The clarity of the exposition is quite high, and the paper is easy to follow.