neurosymbolic reinforcement learning
Neurosymbolic Reinforcement Learning with Formally Verified Exploration
A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that REVEL enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.
Review for NeurIPS paper: Neurosymbolic Reinforcement Learning with Formally Verified Exploration
Weaknesses: (-) I have two majors concerns, one regarding the (theoretical) analysis and the other empirical evaluations. Speaking on the first point, it seems like all the safety guarantees boil down to the fact that the initial shields are safe and verifiable. However, when it gets transformed into the neural space, we use imitation learning and based my understanding, there is no guarantee that by imitation learning, the neural network would exactly reproduce what would happen in the shield. Granted, the initial symbolic-form shield is safe. Yet this transformation step seems to raise the possibility of unsafety.
Review for NeurIPS paper: Neurosymbolic Reinforcement Learning with Formally Verified Exploration
This paper introduces an RL method that satisfies safety constraints during both training and evaluation, via shielding for continuous state and action spaces so that unsafe actions are not selected. The main technical contribution is that there is a symbolic safety specification and policy, which is lifted in a continuous space via imitation learning. Policy updates occur in the lifted space, and then the policy is projected back to a symbolic space where verification can occur. The method has the added advantage that the definition of symbolic safe policies and safety specifications can increase over time as more experience is collected from interactions with the environment. I think this is an interesting scheme, and there are not many safe RL methods that can guarantee safety during training while expanding the safe set.
Neurosymbolic Reinforcement Learning with Formally Verified Exploration
A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that REVEL enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.