Badings, Thom S., Jansen, Nils, Poonawala, Hasan A., Stoelinga, Marielle

We study planning problems for continuous control systems with uncertainty caused by measurement and process noise. The goal is to find an optimal plan that guarantees that the system reaches a desired goal state within finite time. Measurement noise causes limited observability of system states, and process noise causes uncertainty in the outcome of a given plan. These factors render the problem undecidable in general. Our key contribution is a novel abstraction scheme that employs Kalman filtering as a state estimator to obtain a finite-state model, which we formalize as a Markov decision process (MDP). For this MDP, we employ state-of-the-art model checking techniques to efficiently compute plans that maximize the probability of reaching goal states. Moreover, we account for numerical imprecision in computing the abstraction by extending the MDP with intervals of probabilities as a more robust model. We show the correctness of the abstraction and provide several optimizations that aim to balance the quality of the plan and the scalability of the approach. We demonstrate that our method can handle systems that result in MDPs with thousands of states and millions of transitions.

Cubuktepe, Murat, Jansen, Nils, Junges, Sebastian, Marandi, Ahmadreza, Suilen, Marnix, Topcu, Ufuk

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty sets capture uncountable sets of probability distributions. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy given specifications against any admissible distribution. In general, computing such policies is both theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

Adhikari, Ajaya, Hollander, Richard den, Tolios, Ioannis, van Bekkum, Michael, Bal, Anneloes, Hendriks, Stijn, Kruithof, Maarten, Gross, Dennis, Jansen, Nils, Pérez, Guillermo, Buurman, Kit, Raaijmakers, Stephan

Detection of military assets on the ground can be performed by applying deep learning-based object detectors on drone surveillance footage. The traditional way of hiding military assets from sight is camouflage, for example by using camouflage nets. However, large assets like planes or vessels are difficult to conceal by means of traditional camouflage nets. An alternative type of camouflage is the direct misleading of automatic object detectors. Recently, it has been observed that small adversarial changes applied to images of the object can produce erroneous output by deep learning-based detectors. In particular, adversarial attacks have been successfully demonstrated to prohibit person detections in images, requiring a patch with a specific pattern held up in front of the person, thereby essentially camouflaging the person for the detector. Research into this type of patch attacks is still limited and several questions related to the optimal patch configuration remain open. This work makes two contributions. First, we apply patch-based adversarial attacks for the use case of unmanned aerial surveillance, where the patch is laid on top of large military assets, camouflaging them from automatic detectors running over the imagery. The patch can prevent automatic detection of the whole object while only covering a small part of it. Second, we perform several experiments with different patch configurations, varying their size, position, number and saliency. Our results show that adversarial patch attacks form a realistic alternative to traditional camouflage activities, and should therefore be considered in the automated analysis of aerial surveillance imagery.

Winterer, Leonore, Wimmer, Ralf, Jansen, Nils, Becker, Bernd

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.

Gross, Dennis, Jansen, Nils, Pérez, Guillermo A., Raaijmakers, Stephan

We give a formal verification procedure that decides whether a classifier ensemble is robust against arbitrary randomized attacks. Such attacks consist of a set of deterministic attacks and a distribution over this set. The robustness-checking problem consists of assessing, given a set of classifiers and a labelled data set, whether there exists a randomized attack that induces a certain expected loss against all classifiers. We show the NP-hardness of the problem and provide an upper bound on the number of attacks that is sufficient to form an optimal randomized attack. These results provide an effective way to reason about the robustness of a classifier ensemble. We provide SMT and MILP encodings to compute optimal randomized attacks or prove that there is no attack inducing a certain expected loss. In the latter case, the classifier ensemble is provably robust. Our prototype implementation verifies multiple neural-network ensembles trained for image-classification tasks. The experimental results using the MILP encoding are promising both in terms of scalability and the general applicability of our verification procedure.

Junges, Sebastian, Jansen, Nils, Seshia, Sanjit A.

Partially-Observable Markov Decision Processes (POMDPs) are a well-known formal model for planning scenarios where agents operate under limited information about their environment. In safety-critical domains, the agent must adhere to a policy satisfying certain behavioral constraints. We study the problem of synthesizing policies that almost-surely reach some goal state while a set of bad states is never visited. In particular, we present an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints. The approach generalizes and improves previous work in terms of scalability and efficacy, as demonstrated in the empirical evaluation. Additionally, we show the applicability to safe exploration by restricting agent behavior to these winning regions.

Carr, Steven, Jansen, Nils, Topcu, Ufuk

Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.

Phan, Dung, Paoletti, Nicola, Grosu, Radu, Jansen, Nils, Smolka, Scott A., Stoller, Scott D.

We present the Neural Simplex Architecture (NSA), a new approach to runtime assurance that provides safety guarantees for neural controllers (obtained e.g. using reinforcement learning) of complex autonomous and other cyber-physical systems without unduly sacrificing performance. NSA is inspired by the Simplex control architecture of Sha et al., but with some significant differences. In the traditional Simplex approach, the advanced controller (AC) is treated as a black box; there are no techniques for correcting the AC after it generates a potentially unsafe control input that causes a failover to the BC. Our NSA addresses this limitation. NSA not only provides safety assurances for CPSs in the presence of a possibly faulty neural controller, but can also improve the safety of such a controller in an online setting via retraining, without degrading its performance. NSA also offers reverse switching strategies, which allow the AC to resume control of the system under reasonable conditions, allowing the mission to continue unabated. Our experimental results on several significant case studies, including a target-seeking ground rover navigating an obstacle field and a neural controller for an artificial pancreas system, demonstrate NSA's benefits.

Winterer, Leonore, Junges, Sebastian, Wimmer, Ralf, Jansen, Nils, Topcu, Ufuk, Katoen, Joost-Pieter, Becker, Bernd

We study synthesis problems with constraints in partially observable Markov decision processes (POMDPs), where the objective is to compute a strategy for an agent that is guaranteed to satisfy certain safety and performance specifications. Verification and strategy synthesis for POMDPs are, however, computationally intractable in general. We alleviate this difficulty by focusing on planning applications and exploiting typical structural properties of such scenarios; for instance, we assume that the agent has the ability to observe its own position inside an environment. We propose an abstraction refinement framework which turns such a POMDP model into a (fully observable) probabilistic two-player game (PG). For the obtained PGs, efficient verification and synthesis tools allow to determine strategies with optimal safety and performance measures, which approximate optimal schedulers on the POMDP. If the approximation is too coarse to satisfy the given specifications, an refinement scheme improves the computed strategies. As a running example, we use planning problems where an agent moves inside an environment with randomly moving obstacles and restricted observability. We demonstrate that the proposed method advances the state of the art by solving problems several orders-of-magnitude larger than those that can be handled by existing POMDP solvers. Furthermore, this method gives guarantees on safety constraints, which is not supported by the majority of the existing solvers.

Cubuktepe, Murat, Jansen, Nils, Alsiekh, Mohammed, Topcu, Ufuk

We synthesize shared control protocols subject to probabilistic temporal logic specifications. More specifically, we develop a framework in which a human and an autonomy protocol can issue commands to carry out a certain task. We blend these commands into a joint input to a robot. We model the interaction between the human and the robot as a Markov decision process (MDP) that represents the shared control scenario. Using inverse reinforcement learning, we obtain an abstraction of the human's behavior and decisions. We use randomized strategies to account for randomness in human's decisions, caused by factors such as complexity of the task specifications or imperfect interfaces. We design the autonomy protocol to ensure that the resulting robot behavior satisfies given safety and performance specifications in probabilistic temporal logic. Additionally, the resulting strategies generate behavior as similar to the behavior induced by the human's commands as possible. We solve the underlying problem efficiently using quasiconvex programming. Case studies involving autonomous wheelchair navigation and unmanned aerial vehicle mission planning showcase the applicability of our approach.