Review for NeurIPS paper: Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Neural Information Processing Systems 

This paper introduces an RL method that satisfies safety constraints during both training and evaluation, via shielding for continuous state and action spaces so that unsafe actions are not selected. The main technical contribution is that there is a symbolic safety specification and policy, which is lifted in a continuous space via imitation learning. Policy updates occur in the lifted space, and then the policy is projected back to a symbolic space where verification can occur. The method has the added advantage that the definition of symbolic safe policies and safety specifications can increase over time as more experience is collected from interactions with the environment. I think this is an interesting scheme, and there are not many safe RL methods that can guarantee safety during training while expanding the safe set.