Reviews: Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Neural Information Processing Systems 

The paper proposes a method for stopping unnecessary exploration in RL with a bounded regret on the loss. The stopping method, called e-stop, learns from state-only demonstrations provided by an expert. The paper is very well-written and clear to follow. The theoretical analysis of the method is compelling. The experiments are rather minimalistic, but they support the theoretical analysis.