critical scenario
DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality
Chu, Qitong, Yue, Yufeng, Yao, Danya, Pei, Huaxin
The growing deployment of decision-making agents in dynamic environments increases the demand for safety verification. While critical testing scenario generation has emerged as an appealing verification methodology, effectively balancing diversity and criticality remains a key challenge for existing methods, particularly due to local optima entrapment in high-dimensional scenario spaces. To address this limitation, we propose a dual-space guided testing framework that coordinates scenario parameter space and agent behavior space, aiming to generate testing scenarios considering diversity and criticality. Specifically, in the scenario parameter space, a hierarchical representation framework combines dimensionality reduction and multi-dimensional subspace evaluation to efficiently localize diverse and critical subspaces. This guides dynamic coordination between two generation modes: local perturbation and global exploration, optimizing critical scenario quantity and diversity. Complementarily, in the agent behavior space, agent-environment interaction data are leveraged to quantify behavioral criticality/diversity and adaptively support generation mode switching, forming a closed feedback loop that continuously enhances scenario characterization and exploration within the parameter space. Experiments show our framework improves critical scenario generation by an average of 56.23\% and demonstrates greater diversity under novel parameter-behavior co-driven metrics when tested on five decision-making agents, outperforming state-of-the-art baselines.
Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles
Wu, Jiahui, Lu, Chengjie, Arrieta, Aitor, Ali, Shaukat
Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios, but the unlimited number of potential scenarios highlights the importance of identifying critical scenarios that can violate safety or functional requirements. Such requirements are inherently interdependent and need to be tested simultaneously. To this end, we propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements. MOEQT adapts Envelope Q-learning as the MORL algorithm, which dynamically adapts multi-objective weights to balance the relative importance between multiple objectives. MOEQT generates critical scenarios to violate multiple requirements through dynamically interacting with the AV environment, ensuring comprehensive AV testing. We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator and compare MOEQT with two baselines: a random strategy and a single-objective RL with a weighted reward function. Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.
Selecting Critical Scenarios of DER Adoption in Distribution Grids Using Bayesian Optimization
Mulkin, Olivier, Heleno, Miguel, Ludkovski, Mike
We develop a new methodology to select scenarios of DER adoption most critical for distribution grids. Anticipating risks of future voltage and line flow violations due to additional PV adopters is central for utility investment planning but continues to rely on deterministic or ad hoc scenario selection. We propose a highly efficient search framework based on multi-objective Bayesian Optimization. We treat underlying grid stress metrics as computationally expensive black-box functions, approximated via Gaussian Process surrogates and design an acquisition function based on probability of scenarios being Pareto-critical across a collection of line- and bus-based violation objectives. Our approach provides a statistical guarantee and offers an order of magnitude speed-up relative to a conservative exhaustive search. Case studies on realistic feeders with 200-400 buses demonstrate the effectiveness and accuracy of our approach.
Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach
Xu, Weichao, Pei, Huaxin, Yang, Jingxuan, Shi, Yuchen, Zhang, Yi, Zhao, Qianchuan
Recent advances in decision-making policies have led to significant progress in fields such as autonomous driving and robotics. However, testing these policies remains crucial with the existence of critical scenarios that may threaten their reliability. Despite ongoing research, challenges such as low testing efficiency and limited diversity persist due to the complexity of the decision-making policies and their environments. To address these challenges, this paper proposes an adaptable Large Language Model (LLM)-driven online testing framework to explore critical and diverse testing scenarios for decision-making policies. Specifically, we design a "generate-test-feedback" pipeline with templated prompt engineering to harness the world knowledge and reasoning abilities of LLMs. Additionally, a multi-scale scenario generation strategy is proposed to address the limitations of LLMs in making fine-grained adjustments, further enhancing testing efficiency. Finally, the proposed LLM-driven method is evaluated on five widely recognized benchmarks, and the experimental results demonstrate that our method significantly outperforms baseline methods in uncovering both critical and diverse scenarios. These findings suggest that LLM-driven methods hold significant promise for advancing the testing of decision-making policies.
Generating Critical Scenarios for Testing Automated Driving Systems
Nguyen, Trung-Hieu, Vuong, Truong-Giang, Duong, Hong-Nam, Nguyen, Son, Vo, Hieu Dinh, Aoki, Toshiaki, Nguyen, Thu-Trang
Autonomous vehicles (AVs) have demonstrated significant potential in revolutionizing transportation, yet ensuring their safety and reliability remains a critical challenge, especially when exposed to dynamic and unpredictable environments. Real-world testing of an Autonomous Driving System (ADS) is both expensive and risky, making simulation-based testing a preferred approach. In this paper, we propose AVASTRA, a Reinforcement Learning (RL)-based approach to generate realistic critical scenarios for testing ADSs in simulation environments. To capture the complexity of driving scenarios, AVASTRA comprehensively represents the environment by both the internal states of an ADS under-test (e.g., the status of the ADS's core components, speed, or acceleration) and the external states of the surrounding factors in the simulation environment (e.g., weather, traffic flow, or road condition). AVASTRA trains the RL agent to effectively configure the simulation environment that places the AV in dangerous situations and potentially leads it to collisions. We introduce a diverse set of actions that allows the RL agent to systematically configure both environmental conditions and traffic participants. Additionally, based on established safety requirements, we enforce heuristic constraints to ensure the realism and relevance of the generated test scenarios. AVASTRA is evaluated on two popular simulation maps with four different road configurations. Our results show AVASTRA's ability to outperform the state-of-the-art approach by generating 30% to 115% more collision scenarios. Compared to the baseline based on Random Search, AVASTRA achieves up to 275% better performance. These results highlight the effectiveness of AVASTRA in enhancing the safety testing of AVs through realistic comprehensive critical scenario generation.
LAMBDA: Covering the Multimodal Critical Scenarios for Automated Driving Systems by Search Space Quantization
Wu, Xinzheng, Chen, Junyi, Xing, Xingyu, Sun, Jian, Tian, Ye, Liu, Lihao, Shen, Yong
Scenario-based virtual testing is one of the most significant methods to test and evaluate the safety of automated driving systems (ADSs). However, it is impractical to enumerate all concrete scenarios in a logical scenario space and test them exhaustively. Recently, Black-Box Optimization (BBO) was introduced to accelerate the scenario-based test of ADSs by utilizing the historical test information to generate new test cases. However, a single optimum found by the BBO algorithm is insufficient for the purpose of a comprehensive safety evaluation of ADSs in a logical scenario. In fact, all the subspaces representing danger in the logical scenario space, rather than only the most critical concrete scenario, play a more significant role for the safety evaluation. Covering as many of the critical concrete scenarios in a logical scenario space through a limited number of tests is defined as the Black-Box Coverage (BBC) problem in this paper. We formalized this problem in a sample-based search paradigm and constructed a coverage criterion with Confusion Matrix Analysis. Furthermore, we propose LAMBDA (Latent-Action Monte-Carlo Beam Search with Density Adaption) to solve BBC problems. LAMBDA can quickly focus on critical subspaces by recursively partitioning the logical scenario space into accepted and rejected parts. Compared with its predecessor LaMCTS, LAMBDA introduces sampling density to overcome the sampling bias from optimization and Beam Search to obtain more parallelizability. Experimental results show that LAMBDA achieves state-of-the-art performance among all baselines and can reach at most 33 and 6000 times faster than Random Search to get 95% coverage of the critical areas in 2- and 5-dimensional synthetic functions, respectively. Experiments also demonstrate that LAMBDA has a promising future in the safety evaluation of ADSs in virtual tests.
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Liu, Xinhao, Li, Jintong, Jiang, Yicheng, Sujay, Niranjan, Yang, Zhicheng, Zhang, Juexiao, Abanes, John, Zhang, Jing, Feng, Chen
Navigating dynamic urban environments presents significant challenges for embodied agents, requiring advanced spatial reasoning and adherence to common-sense norms. Despite progress, existing visual navigation methods struggle in map-free or off-street settings, limiting the deployment of autonomous agents like last-mile delivery robots. To overcome these obstacles, we propose a scalable, data-driven approach for human-like urban navigation by training agents on thousands of hours of in-the-wild city walking and driving videos sourced from the web. We introduce a simple and scalable data processing pipeline that extracts action supervision from these videos, enabling large-scale imitation learning without costly annotations. Our model learns sophisticated navigation policies to handle diverse challenges and critical scenarios. Experimental results show that training on large-scale, diverse datasets significantly enhances navigation performance, surpassing current methods. This work shows the potential of using abundant online video data to develop robust navigation policies for embodied agents in dynamic urban settings. Project homepage is at https://ai4ce.github.io/CityWalker/.
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation
Tian, Hanlin, Reddy, Kethan, Feng, Yuxiang, Quddus, Mohammed, Demiris, Yiannis, Angeloudis, Panagiotis
This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.
Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process
Ge, Jingwei, Wang, Pengbo, Chang, Cheng, Zhang, Yi, Yao, Danya, Li, Li
Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization process which iteratively generates potential critical scenarios and meanwhile evaluates these scenarios. A bi-level loop is proposed for such life-long learning and testing. In the outer loop, we iteratively learn space knowledge by evaluating AV in the already sampled scenarios and then sample new scenarios based on the retained knowledge. Outer loop stops when all generated samples cover the whole space. While to maximize the coverage of the space in each outer loop, we set an inner loop which receives newly generated samples in outer loop and outputs the updated positions of these samples. We assume that points in a small sphere-like subspace can be covered (or represented) by the point in the center of this sphere. Therefore, we can apply a multi-rounds heuristic strategy to move and pack these spheres in space to find the best covering solution. The simulation results show that faster and more accurate evaluation of AVs can be achieved with more critical scenarios.
Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety Distance
Reimann, Jesse, Mansion, Nico, Haydon, James, Bray, Benjamin, Chattopadhyay, Agnishom, Sato, Sota, Waga, Masaki, André, Étienne, Hasuo, Ichiro, Ueda, Naoki, Yokoyama, Yosuke
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.