rasm
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- (25 more...)
Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- (25 more...)
Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy.
Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees
Žikelić, Đorđe, Lechner, Mathias, Verma, Abhinav, Chatterjee, Krishnendu, Henzinger, Thomas A.
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- (27 more...)
Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees
Žikelić, Đorđe, Lechner, Mathias, Henzinger, Thomas A., Chatterjee, Krishnendu
We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (18 more...)
- Government (0.46)
- Education (0.34)