Althoff, Matthias
Empowering Safe Reinforcement Learning for Power System Control with CommonPower
Eichelbeck, Michael, Markgraf, Hannah, Althoff, Matthias
The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). However, vanilla RL controllers cannot themselves ensure satisfaction of system constraints. Therefore, combining them with formally correct safeguarding mechanisms is an important aspect when studying RL for power system management. Integrating safeguarding into complex use cases requires tool support. To address this need, we introduce the Python tool CommonPower. CommonPower's unique contribution lies in its symbolic modeling approach, which enables flexible, model-based safeguarding of RL controllers. Moreover, CommonPower offers a unified interface for single-agent RL, multi-agent RL, and optimal control, with seamless integration of different forecasting methods. This allows users to validate the effectiveness of safe RL controllers across a large variety of case studies and investigate the influence of specific aspects on overall performance. We demonstrate CommonPower's versatility through a numerical case study that compares RL agents featuring different safeguards with a model predictive controller in the context of building energy management.
Reachset-Conformant System Identification
Lützow, Laura, Althoff, Matthias
Formal verification techniques play a pivotal role in ensuring the safety of complex cyber-physical systems. To transfer model-based verification results to the real world, we require that the measurements of the target system lie in the set of reachable outputs of the corresponding model, a property we refer to as reachset conformance. This paper is on automatically identifying those reachset-conformant models. While state-of-the-art reachset-conformant identification methods focus on linear state-space models, we generalize these methods to nonlinear state-space models and linear and nonlinear input-output models. Furthermore, our identification framework adapts to different levels of prior knowledge on the system dynamics. In particular, we identify the set of model uncertainties for white-box models, the parameters and the set of model uncertainties for gray-box models, and entire reachset-conformant black-box models from data. For the black-box identification, we propose a new genetic programming variant, which we call conformant genetic programming. The robustness and efficacy of our framework are demonstrated in extensive numerical experiments using simulated and real-world data.
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking
Stolz, Roland, Krasowski, Hanna, Thumm, Jakob, Eichelbeck, Michael, Gassert, Philipp, Althoff, Matthias
Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph Structure
Ladner, Tobias, Eichelbeck, Michael, Althoff, Matthias
Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs. They have also been applied in safety-critical environments where perturbations inherently occur. However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments as neural networks are prone to adversarial attacks. While there exists research on the formal verification of neural networks, there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps. This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes. We demonstrate our approach on three popular benchmark datasets.
DrPlanner: Diagnosis and Repair of Motion Planners Using Large Language Models
Lin, Yuanfei, Li, Chenran, Ding, Mingyu, Tomizuka, Masayoshi, Zhan, Wei, Althoff, Matthias
Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners using large language models. Initially, we generate a structured description of the planner and its planned trajectories from both natural and programming languages. Leveraging the profound capabilities of large language models in addressing reasoning challenges, our framework returns repaired planners with detailed diagnostic descriptions. Furthermore, the framework advances iteratively with continuous feedback from the evaluation of the repaired outcomes. Our approach is validated using search-based motion planners; experimental results highlight the need of demonstrations in the prompt and the ability of our framework in identifying and rectifying elusive issues effectively.
Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea
Krasowski, Hanna, Althoff, Matthias
Autonomous vehicles have to obey traffic rules. These rules are often formalized using temporal logic, resulting in constraints that are hard to solve using optimization-based motion planners. Reinforcement Learning (RL) is a promising method to find motion plans adhering to temporal logic specifications. However, vanilla RL algorithms are based on random exploration, which is inherently unsafe. To address this issue, we propose a provably safe RL approach that always complies with traffic rules. As a specific application area, we consider vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). We introduce an efficient verification approach that determines the compliance of actions with respect to the COLREGS formalized using temporal logic. Our action verification is integrated into the RL process so that the agent only selects verified actions. In contrast to agents that only integrate the traffic rule information in the reward function, our provably safe agent always complies with the formalized rules in critical maritime traffic situations and, thus, never causes a collision.
End-To-End Set-Based Training for Neural Network Verification
Koller, Lukas, Ladner, Tobias, Althoff, Matthias
Neural networks are vulnerable to adversarial attacks, i.e., small input perturbations can result in substantially different outputs of a neural network. Safety-critical environments require neural networks that are robust against input perturbations. However, training and formally verifying robust neural networks is challenging. We address this challenge by employing, for the first time, a end-to-end set-based training procedure that trains robust neural networks for formal verification. Our training procedure drastically simplifies the subsequent formal robustness verification of the trained neural network. While previous research has predominantly focused on augmenting neural network training with adversarial attacks, our approach leverages set-based computing to train neural networks with entire sets of perturbed inputs. Moreover, we demonstrate that our set-based training procedure effectively trains robust neural networks, which are easier to verify. In many cases, set-based trained neural networks outperform neural networks trained with state-of-the-art adversarial attacks.
EDGAR: An Autonomous Driving Research Platform -- From Feature Development to Real-World Application
Karle, Phillip, Betz, Tobias, Bosk, Marcin, Fent, Felix, Gehrke, Nils, Geisslinger, Maximilian, Gressenbuch, Luis, Hafemann, Philipp, Huber, Sebastian, Hübner, Maximilian, Huch, Sebastian, Kaljavesi, Gemb, Kerbl, Tobias, Kulmer, Dominik, Mascetta, Tobias, Maierhofer, Sebastian, Pfab, Florian, Rezabek, Filip, Rivera, Esteban, Sagmeister, Simon, Seidlitz, Leander, Sauerbeck, Florian, Tahiraj, Ilir, Trauth, Rainer, Uhlemann, Nico, Würsching, Gerald, Zarrouki, Baha, Althoff, Matthias, Betz, Johannes, Bengler, Klaus, Carle, Georg, Diermeyer, Frank, Ott, Jörg, Lienkamp, Markus
While current research and development of autonomous driving primarily focuses on developing new features and algorithms, the transfer from isolated software components into an entire software stack has been covered sparsely. Besides that, due to the complexity of autonomous software stacks and public road traffic, the optimal validation of entire stacks is an open research problem. Our paper focuses on these two aspects. We present our autonomous research vehicle EDGAR and its digital twin, a detailed virtual duplication of the vehicle. While the vehicle's setup is closely related to the state of the art, its virtual duplication is a valuable contribution as it is crucial for a consistent validation process from simulation to real-world tests. In addition, different development teams can work with the same model, making integration and testing of software stacks much easier, significantly accelerating the development process. The real and virtual vehicles are embedded in a comprehensive development environment, which is also introduced. All parameters of the digital twin are provided open-source at https://github.com/TUMFTM/edgar
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking
Krasowski, Hanna, Thumm, Jakob, Müller, Marlon, Schäfer, Lukas, Wang, Xiao, Althoff, Matthias
Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
Model Predictive Robustness of Signal Temporal Logic Predicates
Lin, Yuanfei, Li, Haoxuan, Althoff, Matthias
The robustness of signal temporal logic not only assesses whether a signal adheres to a specification but also provides a measure of how much a formula is fulfilled or violated. The calculation of robustness is based on evaluating the robustness of underlying predicates. However, the robustness of predicates is usually defined in a model-free way, i.e., without including the system dynamics. Moreover, it is often nontrivial to define the robustness of complicated predicates precisely. To address these issues, we propose a notion of model predictive robustness, which provides a more systematic way of evaluating robustness compared to previous approaches by considering model-based predictions. In particular, we use Gaussian process regression to learn the robustness based on precomputed predictions so that robustness values can be efficiently computed online. We evaluate our approach for the use case of autonomous driving with predicates used in formalized traffic rules on a recorded dataset, which highlights the advantage of our approach compared to traditional approaches in terms of precision. By incorporating our robustness definitions into a trajectory planner, autonomous vehicles obey traffic rules more robustly than human drivers in the dataset.