Goto

Collaborating Authors

 Rodriguez, Andoni


Research Vision: Multi-Agent Path Planning for Cops And Robbers Via Reactive Synthesis

arXiv.org Artificial Intelligence

Reactive synthesis is classically modeled as a game, though often applied to domains such as arbiter circuits and communication protocols [1]. We aim to show how reactive synthesis can be applied to a literal game - cops and robbers - to generate strategies for agents in the game. We propose a game that requires the coordination of multiple agents in a space of datatypes and operations that are richer than is easily captured by the traditional Linear Temporal Logic (LTL) approach of synthesis over Boolean streams [2]. In particular, we draw inspiration from prior work on Coordination Synthesis [3], LTL moduluo theories (LTLt) [4], and Temporal Stream Logic Moduluo theories (TSL-MT) [5, 6] to describe our problem and potential solution spaces. The traditional game [7] asks whether K cops can catch a single robber on a graph. In a temporal logic setting, this amounts to a safety condition on the robbers (they are never caught by the cops), and the dual liveness condition for the cops (they eventually catch the robbers). We modify the traditional graph theory focused version of the game to have a more visual game on a grid system, allowing for various configurations, including: An environment with various node types such as walls, safe zones, and open spaces.


Realizable Continuous-Space Shields for Safe Reinforcement Learning

arXiv.org Artificial Intelligence

While Deep Reinforcement Learning (DRL) has achieved remarkable success across various domains, it remains vulnerable to occasional catastrophic failures without additional safeguards. An effective solution to prevent these failures is to use a shield that validates and adjusts the agent's actions to ensure compliance with a provided set of safety specifications. For real-world robotic domains, it is essential to define safety specifications over continuous state and action spaces to accurately account for system dynamics and compute new actions that minimally deviate from the agent's original decision. In this paper, we present the first shielding approach specifically designed to ensure the satisfaction of safety requirements in continuous state and action spaces, making it suitable for practical robotic applications. Our method builds upon realizability, an essential property that confirms the shield will always be able to generate a safe action for any state in the environment. We formally prove that realizability can be verified for stateful shields, enabling the incorporation of non-Markovian safety requirements, such as loop avoidance. Finally, we demonstrate the effectiveness of our approach in ensuring safety without compromising the policy's success rate by applying it to a navigation problem and a multi-agent particle environment Keywords: Shielding, Reinforcement Learning, Safety, Robotics


Verification-Guided Shielding for Deep Reinforcement Learning

arXiv.org Artificial Intelligence

In recent years, Deep Reinforcement Learning (DRL) has emerged as an effective approach to solving real-world tasks. However, despite their successes, DRL-based policies suffer from poor reliability, which limits their deployment in safety-critical domains. Various methods have been put forth to address this issue by providing formal safety guarantees. Two main approaches include shielding and verification. While shielding ensures the safe behavior of the policy by employing an external online component (i.e., a ``shield'') that overrides potentially dangerous actions, this approach has a significant computational cost as the shield must be invoked at runtime to validate every decision. On the other hand, verification is an offline process that can identify policies that are unsafe, prior to their deployment, yet, without providing alternative actions when such a policy is deemed unsafe. In this work, we present verification-guided shielding -- a novel approach that bridges the DRL reliability gap by integrating these two methods. Our approach combines both formal and probabilistic verification tools to partition the input domain into safe and unsafe regions. In addition, we employ clustering and symbolic representation procedures that compress the unsafe regions into a compact representation. This, in turn, allows to temporarily activate the shield solely in (potentially) unsafe regions, in an efficient manner. Our novel approach allows to significantly reduce runtime overhead while still preserving formal safety guarantees. We extensively evaluate our approach on two benchmarks from the robotic navigation domain, as well as provide an in-depth analysis of its scalability and completeness.


Shield Synthesis for LTL Modulo Theories

arXiv.org Artificial Intelligence

In recent years, Machine Learning (ML) models have achieved remarkable success in various domains. However, these models also tend to demonstrate unsafe behaviors, precluding their deployment in safety-critical systems. To cope with this issue, ample research focuses on developing methods that guarantee the safe behaviour of a given ML model. A prominent example is shielding which incorporates an external component (a "shield") that blocks unwanted behavior. Despite significant progress, shielding suffers from a main setback: it is currently geared towards properties encoded solely in propositional logics (e.g., LTL) and is unsuitable for richer logics. This, in turn, limits the widespread applicability of shielding in many real-world systems. In this work, we address this gap, and extend shielding to LTL modulo theories, by building upon recent advances in reactive synthesis modulo theories. This allowed us to develop a novel approach for generating shields conforming to complex safety specifications in these more expressive, logics. We evaluated our shields and demonstrate their ability to handle rich data with temporal dynamics. To the best of our knowledge, this is the first approach for synthesizing shields for such expressivity.