Goto

Collaborating Authors

 safety filter


ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation

Neural Information Processing Systems

DALL E) can produce high-quality images based on input language descriptions. These models incorporate a black-box safety filter to prevent the generation of unsafe or unethical content, such as violent, criminal, or hateful imagery. Recent jailbreaking methods generate adversarial prompts capable of bypassing safety filters and producing unsafe content, exposing vulnerabilities in influential commercial models. However, once these adversarial prompts are identified, the safety filter can be updated to prevent the generation of unsafe images. In this work, we propose an effective, simple, and difficult-to-detect jailbreaking solution: generating safe content initially with normal text prompts and then editing the generations to embed unsafe content.


DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

Li, Boheng, Wang, Junjie, Li, Yiming, Hu, Zhiyang, Qi, Leyi, Dong, Jianshuo, Wang, Run, Qiu, Han, Qin, Zhan, Zhang, Tianwei

arXiv.org Artificial Intelligence

Despite the integration of safety alignment and external filters, text-to-image (T2I) generative systems are still susceptible to producing harmful content, such as sexual or violent imagery. This raises serious concerns about unintended exposure and potential misuse. Red teaming, which aims to proactively identify diverse prompts that can elicit unsafe outputs from the T2I system, is increasingly recognized as an essential method for assessing and improving safety before real-world deployment. However, existing automated red teaming approaches often treat prompt discovery as an isolated, prompt-level optimization task, which limits their scalability, diversity, and overall effectiveness. To bridge this gap, in this paper, we propose DREAM, a scalable red teaming framework to automatically uncover diverse problematic prompts from a given T2I system. Unlike prior work that optimizes prompts individually, DREAM directly models the probabilistic distribution of the target system's problematic prompts, which enables explicit optimization over both effectiveness and diversity, and allows efficient large-scale sampling after training. To achieve this without direct access to representative training samples, we draw inspiration from energy-based models and reformulate the objective into a simple and tractable form. We further introduce GC-SPSA, an efficient optimization algorithm that provides stable gradient estimates through the long and potentially non-differentiable T2I pipeline. During inference, we also propose a diversity-aware sampling strategy to enhance prompt variety. The effectiveness of DREAM is validated through extensive experiments, demonstrating state-of-the-art performance across a wide range of T2I models and safety filters in terms of both prompt success rate and diversity. Our code is available at https://github.com/AntigoneRandy/DREAM


Toward generic control for soft robotic systems

Sun, Yu, Deng, Yaosheng, Mei, Wenjie, Xiong, Xiaogang, Bai, Yang, Ogura, Masaki, Zhou, Zeyu, Feroskhan, Mir, Wang, Michael Yu, Zuo, Qiyang, Li, Yao, Lou, Yunjiang

arXiv.org Artificial Intelligence

Soft robotics has advanced rapidly, yet its control methods remain fragmented: different morphologies and actuation schemes still require task-specific controllers, hindering theoretical integration and large-scale deployment. A generic control framework is therefore essential, and a key obstacle lies in the persistent use of rigid-body control logic, which relies on precise models and strict low-level execution. Such a paradigm is effective for rigid robots but fails for soft robots, where the ability to tolerate and exploit approximate action representations, i.e., control compliance, is the basis of robustness and adaptability rather than a disturbance to be eliminated. Control should thus shift from suppressing compliance to explicitly exploiting it. Human motor control exemplifies this principle: instead of computing exact dynamics or issuing detailed muscle-level commands, it expresses intention through high-level movement tendencies, while reflexes and biomechanical mechanisms autonomously resolve local details. This architecture enables robustness, flexibility, and cross-task generalization. Motivated by this insight, we propose a generic soft-robot control framework grounded in control compliance and validate it across robots with diverse morphologies and actuation mechanisms. The results demonstrate stable, safe, and cross-platform transferable behavior, indicating that embracing control compliance, rather than resisting it, may provide a widely applicable foundation for unified soft-robot control.


LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation

Chiu, Darren, Huang, Zhehui, Ge, Ruohai, Sukhatme, Gaurav S.

arXiv.org Artificial Intelligence

Figure 1: LEARN is a lightweight, two-stage safety-guided reinforcement learning framework for multi-UA V navigation in cluttered indoor and outdoor spaces. All processes, including perception, localization, communication, planning, and control, run purely on an embedded single-core controller running at 168 MHz with 192 KB of RAM. A single policy is trained in simulation and duplicated across all quadrotors. During deployment, a minimum snap naive planner produces goal points for the encoder. Quadrotors obtain the two closest neighbor positions and velocities through radio; and obstacles are sensed using a low dimensional time-of-flight sensor. The policy generates individual normalized rotor thrusts that are sent directly to the motors. LEARN is zero-shot transferable to the real world with no fine-tuning. Experiments show that it scales up to 6 quadrotors in the real world and 24 in simulation. Abstract--Nano-UA V teams offer great agility yet face severe navigation challenges due to constrained onboard sensing, communication, and computation. Existing approaches rely on high-resolution vision or compute-intensive planners, rendering them infeasible for these platforms. All authors are with the University of Southern California. Our system combines low-resolution Time-of-Flight (T oF) sensors and a simple motion planner with a compact, attention-based RL policy. In simulation, LEARN outperforms two state-of-the-art planners by 10% while using substantially fewer resources. We demonstrate LEARN's viability on six Crazyflie quadro-tors, achieving fully onboard flight in diverse indoor and outdoor environments at speeds up to 2.0m/s and traversing 0.2m gaps. EDG-Team switches to a centralized and synchronous planner in dense environments [6]. Nmanned aerial vehicles (UA Vs) are increasingly used in domains such as surveillance [1], search and rescue [2], and planetary exploration [3]. The physics of flight impose stringent size, weight, and power (SWaP) constraints on these platforms, making efficient system design paramount. While autonomy in UA Vs has advanced significantly, many state-of-the-art navigation approaches fail to scale to resource-constrained platforms.


How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints

Nakamura, Kensuke, Bishop, Arun L., Man, Steven, Johnson, Aaron M., Manchester, Zachary, Bajcsy, Andrea

arXiv.org Artificial Intelligence

Latent safety filters extend Hamilton-Jacobi (HJ) reachability to operate on latent state representations and dynamics learned directly from high-dimensional observations, enabling safe visuo-motor control under hard-to-model constraints. However, existing methods implement "least-restrictive" filtering that discretely switch between nominal and safety policies, potentially undermining the task performance that makes modern visuomotor policies valuable. While reach-ability value functions can, in principle, be adapted to be control barrier functions (CBFs) for smooth optimization-based filtering, we theoretically and empirically show that current latent-space learning methods produce fundamentally incompatible value functions. We identify two sources of incompatibility: First, in HJ reachability, failures are encoded via a "margin function" in latent space, whose sign indicates whether or not a latent is in the constraint set. However, representing the margin function as a classifier yields saturated value functions that exhibit discontinuous jumps. We prove that the value function's Lipschitz constant scales linearly with the margin function's Lipschitz constant, revealing that smooth CBFs require smooth margins. Second, reinforcement learning (RL) approximations trained solely on safety policy data yield inaccurate value estimates for nominal policy actions, precisely where CBF filtering needs them. We propose the LatentCBF, which addresses both challenges through gradient penalties that lead to smooth margin functions without additional labeling, and a value-training procedure that mixes data from both nominal and safety policy distributions. Experiments on simulated benchmarks and hardware with a vision-based manipulation policy demonstrate that LatentCBF enables smooth safety filtering while doubling the task-completion rate over prior switching methods.


Statistically Assuring Safety of Control Systems using Ensembles of Safety Filters and Conformal Prediction

Tabbara, Ihab, Yang, Yuxuan, Sibai, Hussein

arXiv.org Artificial Intelligence

Safety assurance is a fundamental requirement for deploying learning-enabled autonomous systems. Hamilton-Jacobi (HJ) reachability analysis is a fundamental method for formally verifying safety and generating safe controllers. However, computing the HJ value function that characterizes the backward reachable set (BRS) of a set of user-defined failure states is computationally expensive, especially for high-dimensional systems, motivating the use of reinforcement learning approaches to approximate the value function. Unfortunately, a learned value function and its corresponding safe policy are not guaranteed to be correct. The learned value function evaluated at a given state may not be equal to the actual safety return achieved by following the learned safe policy. To address this challenge, we introduce a conformal prediction-based (CP) framework that bounds such uncertainty. We leverage CP to provide probabilistic safety guarantees when using learned HJ value functions and policies to prevent control systems from reaching failure states. Specifically, we use CP to calibrate the switching between the unsafe nominal controller and the learned HJ-based safe policy and to derive safety guarantees under this switched policy. We also investigate using an ensemble of independently trained HJ value functions as a safety filter and compare this ensemble approach to using individual value functions alone.


Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

Calzolari, Gabriele, Sumathy, Vidya, Kanellakis, Christoforos, Nikolakopoulos, George

arXiv.org Artificial Intelligence

Autonomous exploration of obstacle-rich spaces requires strategies that ensure efficiency while guaranteeing safety against collisions with obstacles. This paper investigates a novel platform-agnostic reinforcement learning framework that integrates a graph neural network-based policy for next-waypoint selection, with a safety filter ensuring safe mobility. Specifically, the neural network is trained using reinforcement learning through the Proximal Policy Optimization (PPO) algorithm to maximize exploration efficiency while minimizing safety filter interventions. Henceforth, when the policy proposes an infeasible action, the safety filter overrides it with the closest feasible alternative, ensuring consistent system behavior. In addition, this paper introduces a reward function shaped by a potential field that accounts for both the agent's proximity to unexplored regions and the expected information gain from reaching them. The proposed framework combines the adaptability of reinforcement learning-based exploration policies with the reliability provided by explicit safety mechanisms. This feature plays a key role in enabling the deployment of learning-based policies on robotic platforms operating in real-world environments. Extensive evaluations in both simulations and experiments performed in a lab environment demonstrate that the approach achieves efficient and safe exploration in cluttered spaces.



ATOM-CBF: Adaptive Safe Perception-Based Control under Out-of-Distribution Measurements

Yun, Kai S., Azizan, Navid

arXiv.org Artificial Intelligence

Ensuring the safety of real-world systems is challenging, especially when they rely on learned perception modules to infer the system state from high-dimensional sensor data. These perception modules are vulnerable to epistemic uncertainty, often failing when encountering out-of-distribution (OoD) measurements not seen during training. To address this gap, we introduce ATOM-CBF (Adaptive-To-OoD-Measurement Control Barrier Function), a novel safe control framework that explicitly computes and adapts to the epistemic uncertainty from OoD measurements, without the need for ground-truth labels or information on distribution shifts. Our approach features two key components: (1) an OoD-aware adaptive perception error margin and (2) a safety filter that integrates this adaptive error margin, enabling the filter to adjust its conservatism in real-time. We provide empirical validation in simulations, demonstrating that ATOM-CBF maintains safety for an F1Tenth vehicle with LiDAR scans and a quadruped robot with RGB images.


From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies

Römer, Ralf, Balletshofer, Julian, Thumm, Jakob, Pavone, Marco, Schoellig, Angela P., Althoff, Matthias

arXiv.org Artificial Intelligence

Diffusion policies (DPs) achieve state-of-the-art performance on complex manipulation tasks by learning from large-scale demonstration datasets, often spanning multiple embodiments and environments. However, they cannot guarantee safe behavior, so external safety mechanisms are needed. These, however, alter actions in ways unseen during training, causing unpredictable behavior and performance degradation. To address these problems, we propose path-consistent safety filtering (PACS) for DPs. Our approach performs path-consistent braking on a trajectory computed from the sequence of generated actions. In this way, we keep execution consistent with the policy's training distribution, maintaining the learned, task-completing behavior. To enable a real-time deployment and handle uncertainties, we verify safety using set-based reachability analysis. Our experimental evaluation in simulation and on three challenging real-world human-robot interaction tasks shows that PACS (a) provides formal safety guarantees in dynamic environments, (b) preserves task success rates, and (c) outperforms reactive safety approaches, such as control barrier functions, by up to 68% in terms of task success. Videos are available at our project website: https://tum-lsy.github.io/pacs/.