tomlin
Resolving Conflicting Constraints in Multi-Agent Reinforcement Learning with Layered Safety
Choi, Jason J., Aloor, Jasmine Jerry, Li, Jingqi, Mendoza, Maria G., Balakrishnan, Hamsa, Tomlin, Claire J.
Preventing collisions in multi-robot navigation is crucial for deployment. This requirement hinders the use of learning-based approaches, such as multi-agent reinforcement learning (MARL), on their own due to their lack of safety guarantees. Traditional control methods, such as reachability and control barrier functions, can provide rigorous safety guarantees when interactions are limited only to a small number of robots. However, conflicts between the constraints faced by different agents pose a challenge to safe multi-agent coordination. To overcome this challenge, we propose a method that integrates multiple layers of safety by combining MARL with safety filters. First, MARL is used to learn strategies that minimize multiple agent interactions, where multiple indicates more than two. Particularly, we focus on interactions likely to result in conflicting constraints within the engagement distance. Next, for agents that enter the engagement distance, we prioritize pairs requiring the most urgent corrective actions. Finally, a dedicated safety filter provides tactical corrective actions to resolve these conflicts. Crucially, the design decisions for all layers of this framework are grounded in reachability analysis and a control barrier-value function-based filtering mechanism. We validate our Layered Safe MARL framework in 1) hardware experiments using Crazyflie drones and 2) high-density advanced aerial mobility (AAM) operation scenarios, where agents navigate to designated waypoints while avoiding collisions. The results show that our method significantly reduces conflict while maintaining safety without sacrificing much efficiency (i.e., shorter travel time and distance) compared to baselines that do not incorporate layered safety. The project website is available at https://dinamo-mit.github.io/Layered-Safe-MARL/
- North America > United States > California > San Francisco County > San Francisco (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Transportation > Air (1.00)
- Transportation > Infrastructure & Services (0.95)
- Transportation > Passenger (0.94)
- (2 more...)
Threshold Strategy for Leaking Corner-Free Hamilton-Jacobi Reachability with Decomposed Computations
He, Chong, Mariappan, Mugilan, Vora, Keval, Chen, Mo
Hamilton-Jacobi (HJ) Reachability is widely used to compute value functions for states satisfying specific control objectives. However, it becomes intractable for high-dimensional problems due to the curse of dimensionality. Dimensionality reduction approaches are essential for mitigating this challenge, whereas they could introduce the ``leaking corner issue", leading to inaccuracies in the results. In this paper, we define the ``leaking corner issue" in terms of value functions, propose and prove a necessary condition for its occurrence. We then use these theoretical contributions to introduce a new local updating method that efficiently corrects inaccurate value functions while maintaining the computational efficiency of the dimensionality reduction approaches. We demonstrate the effectiveness of our method through numerical simulations. Although we validate our method with the self-contained subsystem decomposition (SCSD), our approach is applicable to other dimensionality reduction techniques that introduce the ``leaking corners".
Certified Approximate Reachability (CARe): Formal Error Bounds on Deep Learning of Reachable Sets
Solanki, Prashant, Vertovec, Nikolaus, Schnitzer, Yannik, Van Beers, Jasper, de Visser, Coen, Abate, Alessandro
-- Recent approaches to leveraging deep learning for computing reachable sets of continuous-time dynamical systems have gained popularity over traditional level-set methods, as they overcome the curse of dimensionality. However, as with level-set methods, considerable care needs to be taken in limiting approximation errors, particularly since no guarantees are provided during training on the accuracy of the learned reachable set. T o address this limitation, we introduce an ϵ -approximate Hamilton-Jacobi Partial Differential Equation (HJ-PDE), which establishes a relationship between training loss and accuracy of the true reachable set. T o formally certify this approximation, we leverage Satisfiability Modulo Theories (SMT) solvers to bound the residual error of the HJ-based loss function across the domain of interest. Leveraging Counter Example Guided Inductive Synthesis (CEGIS), we close the loop around learning and verification, by fine-tuning the neural network on counterexamples found by the SMT solver, thus improving the accuracy of the learned reachable set. T o the best of our knowledge, Certified Approximate Reachability (CARe) is the first approach to provide soundness guarantees on learned reachable sets of continuous dynamical systems.
- North America > United States > Indiana (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning
Li, Jingqi, Lee, Donggun, Sojoudi, Somayeh, Tomlin, Claire J.
In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
Bajcsy, Andrea, Fisac, Jaime F.
Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Austria > Styria > Graz (0.04)
- Transportation > Air (1.00)
- Automobiles & Trucks (0.68)
- Leisure & Entertainment > Games (0.67)
- (3 more...)
Providing Safety Assurances for Systems with Unknown Dynamics
Wang, Hao, Borquez, Javier, Bansal, Somil
As autonomous systems become more complex and integral in our society, the need to accurately model and safely control these systems has increased significantly. In the past decade, there has been tremendous success in using deep learning techniques to model and control systems that are difficult to model using first principles. However, providing safety assurances for such systems remains difficult, partially due to the uncertainty in the learned model. In this work, we aim to provide safety assurances for systems whose dynamics are not readily derived from first principles and, hence, are more advantageous to be learned using deep learning techniques. Given the system of interest and safety constraints, we learn an ensemble model of the system dynamics from data. Leveraging ensemble uncertainty as a measure of uncertainty in the learned dynamics model, we compute a maximal robust control invariant set, starting from which the system is guaranteed to satisfy the safety constraints under the condition that realized model uncertainties are contained in the predefined set of admissible model uncertainty. We demonstrate the effectiveness of our method using a simulated case study with an inverted pendulum and a hardware experiment with a TurtleBot. The experiments show that our method robustifies the control actions of the system against model uncertainty and generates safe behaviors without being overly restrictive. The codes and accompanying videos can be found on the project website.
- North America > United States > California (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Indiana (0.04)
Parameterized Fast and Safe Tracking (FaSTrack) using Deepreach
Jeong, Hyun Joe, Gong, Zheng, Bansal, Somil, Herbert, Sylvia
Fast and Safe Tracking (FaSTrack, Herbert* et al. (2017)) is a modular framework that provides safety guarantees while planning and executing trajectories in real time via value functions of Hamilton-Jacobi (HJ) reachability. These value functions are computed through dynamic programming, which is notorious for being computationally inefficient. Moreover, the resulting trajectory does not adapt online to the environment, such as sudden disturbances or obstacles. DeepReach (Bansal and Tomlin (2021)) is a scalable deep learning method to HJ reachability that allows parameterization of states, which opens up possibilities for online adaptation to various controls and disturbances. In this paper, we propose Parametric FaSTrack, which uses DeepReach to approximate a value function that parameterizes the control bounds of the planning model. The new framework can smoothly trade off between the navigation speed and the tracking error (therefore maneuverability) while guaranteeing obstacle avoidance in a priori unknown environments. We demonstrate our method through two examples and a benchmark comparison with existing methods, showing the safety, efficiency, and faster solution times of the framework.
Lovelorn men turn to artificial intelligence, dating guru to help get a date: 'Viagra for your social profile'
Artificial Intelligence poses both risks and rewards, but developers should be weary of technologies that could threaten "scary" outcomes, AI technologist says. Men who have trouble finding dates are reportedly turning to artificial intelligence and self-described love guru to craft appealing dating profiles. "My AI prompts and training can turn any guy from zero to hero," Stefan-Pierre Tomlin, a 32-year-old London model and self-described love guru, told South West News Service, according to the New York Post. Tomlin operates a website called Celebrity Love Coach where subscribers can pay between roughly $55 to $150 a month to receive his advice and "support to help you achieve your dating goals," according to the website. Subscribers also receive access to "bespoke" AI to draft appealing dating profiles.
- North America > United States > New York (0.26)
- Oceania > Australia > New South Wales (0.05)
- Media > News (0.70)
- Health & Medicine > Therapeutic Area > Urology (0.41)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.41)
Voronoi Progressive Widening: Efficient Online Solvers for Continuous Space MDPs and POMDPs with Provably Optimal Components
Lim, Michael H., Tomlin, Claire J., Sunberg, Zachary N.
Markov decision processes (MDPs) and partially observable MDPs (POMDPs) can effectively represent complex real-world decision and control problems. However, continuous space MDPs and POMDPs, i.e. those having continuous state, action and observation spaces, are extremely difficult to solve, and there are few online algorithms with convergence guarantees. This paper introduces Voronoi Progressive Widening (VPW), a general technique to modify tree search algorithms to effectively handle continuous or hybrid action spaces, and proposes and evaluates three continuous space solvers: VOSS, VOWSS, and VOMCPOW. VOSS and VOWSS are theoretical tools based on sparse sampling and Voronoi optimistic optimization designed to justify VPW-based online solvers. While previous algorithms have enjoyed convergence guarantees for problems with continuous state and observation spaces, VOWSS is the first with global convergence guarantees for problems that additionally have continuous action spaces. VOMCPOW is a versatile and efficient VPW-based algorithm that consistently outperforms POMCPOW and BOMCP in several simulation experiments.
- Europe (0.67)
- North America > United States > California (0.28)
- North America > United States > Colorado (0.28)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.78)
Steelers experimenting with robot tackling dummies
The Pittsburgh Steelers are testing out a smart dummy. A Mobile Virtual Player, a remote controlled robotic dummy, was introduced during Steelers workouts. The mobile dummies, developed at Dartmouth College, could aid in tackling development without the risk of players hitting each other. "The applications we are quickly finding are endless," coach Mike Tomlin told the team's official website. It runs at an appropriate football speed.
- Leisure & Entertainment > Sports > Football (0.57)
- Media > News (0.40)