AITopics | testing scenario

Collaborating Authors

testing scenario

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

'I think you're testing me': Anthropic's new AI model asks testers to come clean

The GuardianOct-1-2025, 11:47:55 GMT

Anthropic said the exchanges were an'urgent sign' that its testing scenarios needed to be more realistic. Anthropic said the exchanges were an'urgent sign' that its testing scenarios needed to be more realistic. 'I think you're testing me': Anthropic's new AI model asks testers to come clean Safety evaluation of Claude Sonnet 4.5 raises questions about whether predecessors'played along', firm says Wed 1 Oct 2025 07.47 EDTLast modified on Wed 1 Oct 2025 21.30 EDT If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to. Anthropic, a San Francisco-based artificial intelligence company, has released a safety analysis of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way. Evaluators said during a "somewhat clumsy" test for political sycophancy, the large language model (LLM) - the underlying technology that powers a chatbot - raised suspicions it was being tested and asked the testers to come clean.

anthropic, new ai model ask tester, testing scenario, (7 more...)

The Guardian

Country:

Europe > United Kingdom (0.31)
North America > United States > California > San Francisco County > San Francisco (0.25)
Europe > Ukraine (0.07)
(2 more...)

Industry:

Leisure & Entertainment > Sports (0.73)
Government > Regional Government > Europe Government > United Kingdom Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

f69e505b08403ad2298b9f262659929a-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 09:33:37 GMT

formulation, reviewer, system environment, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality

Chu, Qitong, Yue, Yufeng, Yao, Danya, Pei, Huaxin

arXiv.org Artificial IntelligenceAug-18-2025

The growing deployment of decision-making agents in dynamic environments increases the demand for safety verification. While critical testing scenario generation has emerged as an appealing verification methodology, effectively balancing diversity and criticality remains a key challenge for existing methods, particularly due to local optima entrapment in high-dimensional scenario spaces. To address this limitation, we propose a dual-space guided testing framework that coordinates scenario parameter space and agent behavior space, aiming to generate testing scenarios considering diversity and criticality. Specifically, in the scenario parameter space, a hierarchical representation framework combines dimensionality reduction and multi-dimensional subspace evaluation to efficiently localize diverse and critical subspaces. This guides dynamic coordination between two generation modes: local perturbation and global exploration, optimizing critical scenario quantity and diversity. Complementarily, in the agent behavior space, agent-environment interaction data are leveraged to quantify behavioral criticality/diversity and adaptively support generation mode switching, forming a closed feedback loop that continuously enhances scenario characterization and exploration within the parameter space. Experiments show our framework improves critical scenario generation by an average of 56.23\% and demonstrates greater diversity under novel parameter-behavior co-driven metrics when tested on five decision-making agents, outperforming state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.11514

Genre: Research Report (0.82)

Industry:

Transportation (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

a48ad12d588c597f4725a8b84af647b5-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-17-2025, 09:17:04 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Information Technology (0.95)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems

Wu, Xinzheng, Chen, Junyi, Wu, Jianfeng, Zhang, Longgao, Xia, Tian, Shen, Yong

arXiv.org Artificial IntelligenceJan-21-2025

Testing and evaluation is an important step before the large-scale application of the autonomous driving systems (ADSs). Based on the three level of scenario abstraction theory, a testing can be performed within a logical scenario, followed by an evaluation stage which is inputted with the testing results of each concrete scenario generated from the logical parameter space. During the above process, abundant testing information is produced which is beneficial for comprehensive and accurate evaluations. To make full use of testing information, this paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM). Based on a Monte Carlo Tree Search (MCTS) paradigm and a dual surrogates testing framework proposed in our previous work, this paper applies the intermediate information (i.e., the tree structure, including the affiliation of each historical sampled point with the subspaces and the parent-child relationship between subspaces) generated during the testing stage into the evaluation stage to achieve accurate hazardous domain identification. Moreover, to better serve this purpose, the UCB calculation method is improved to allow the search algorithm to focus more on the hazardous domain boundaries. Further, a stopping condition is constructed based on the convergence of the search algorithm. Ablation and comparative experiments are then conducted to verify the effectiveness of the improvements and the superiority of the proposed method. The experimental results show that ITEM could well identify the hazardous domains in both low- and high-dimensional cases, regardless of the shape of the hazardous domains, indicating its generality and potential for the safety evaluation of ADSs.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.11924

Country:

Europe (0.04)
Asia > Macao (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.70)
Information Technology > Robotics & Automation (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Flow to Rare Events: An Application of Normalizing Flow in Temporal Importance Sampling for Automated Vehicle Validation

Ye, Yichun, Zhang, He, Tian, Ye, Sun, Jian

arXiv.org Artificial IntelligenceJul-9-2024

Automated Vehicle (AV) validation based on simulated testing requires unbiased evaluation and high efficiency. One effective solution is to increase the exposure to risky rare events while reweighting the probability measure. However, characterizing the distribution of risky events is particularly challenging due to the paucity of samples and the temporality of continuous scenario variables. To solve it, we devise a method to represent, generate, and reweight the distribution of risky rare events. We decompose the temporal evolution of continuous variables into distribution components based on conditional probability. By introducing the Risk Indicator Function, the distribution of risky rare events is theoretically precipitated out of naturalistic driving distribution. This targeted distribution is practically generated via Normalizing Flow, which achieves exact and tractable probability evaluation of intricate distribution. The rare event distribution is then demonstrated as the advantageous Importance Sampling distribution. We also promote the technique of temporal Importance Sampling. The combined method, named as TrimFlow, is executed to estimate the collision rate of Car-following scenarios as a tentative practice. The results showed that sampling background vehicle maneuvers from rare event distribution could evolve testing scenarios to hazardous states. TrimFlow reduced 86.1% of tests compared to generating testing scenarios according to their exposure in the naturalistic driving environment. In addition, the TrimFlow method is not limited to one specific type of functional scenario.

normalizing flow, scenario, vehicle, (15 more...)

arXiv.org Artificial Intelligence

2407.0732

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report > New Finding (0.86)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rare Class Prediction Model for Smart Industry in Semiconductor Manufacturing

Farrag, Abdelrahman, Ghali, Mohammed-Khalil, Jin, Yu

arXiv.org Artificial IntelligenceJun-6-2024

The evolution of industry has enabled the integration of physical and digital systems, facilitating the collection of extensive data on manufacturing processes. This integration provides a reliable solution for improving process quality and managing equipment health. However, data collected from real manufacturing processes often exhibit challenging properties, such as severe class imbalance, high rates of missing values, and noisy features, which hinder effective machine learning implementation. In this study, a rare class prediction approach is developed for in situ data collected from a smart semiconductor manufacturing process. The primary objective is to build a model that addresses issues of noise and class imbalance, enhancing class separation. The developed approach demonstrated promising results compared to existing literature, which would allow the prediction of new observations that could give insights into future maintenance plans and production quality. The model was evaluated using various performance metrics, with ROC curves showing an AUC of 0.95, a precision of 0.66, and a recall of 0.96

accuracy, dataset, imputation, (15 more...)

arXiv.org Artificial Intelligence

2406.04533

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Hardware (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity

Li, Shu, Yang, Jingxuan, He, Honglin, Zhang, Yi, Hu, Jianming, Feng, Shuo

arXiv.org Artificial IntelligenceFeb-1-2024

Testing and evaluating the safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. Practically, the acceptable cost of testing specific AV model can be restricted within an extremely small limit because of testing cost or time. With existing testing methods, the limitations imposed by strictly restricted testing numbers often result in significant uncertainties or challenges in quantifying testing results. In this paper, we formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic FST framework to address this challenge. To alleviate the considerable uncertainty inherent in a small testing scenario set and optimize scenario utilization, we frame the FST problem as an optimization problem and search for a small scenario set based on neighborhood coverage and similarity. By leveraging the prior information on surrogate models (SMs), we dynamically adjust the testing scenario set and the contribution of each scenario to the testing result under the guidance of better generalization ability on AVs. With certain hypotheses on SMs, a theoretical upper bound of testing error is established to verify the sufficiency of testing accuracy within given limited number of tests. The experiments of the cut-in scenario using FST method demonstrate a notable reduction in testing error and variance compared to conventional testing methods, especially for situations with a strict limitation on the number of scenarios.

fst method, scenario, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2402.01795

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SICO: Simulation for Infection Control Operations

Pine, Karleigh, Veliche, Razvan, Bennett, Jared, Klipfel, Joel

arXiv.org Artificial IntelligenceAug-18-2023

In response to the COVID-19 pandemic and the potential threat of future epidemics caused by novel viruses, we developed a flexible framework for modeling disease intervention effects. This tool is intended to aid decision makers at multiple levels as they compare possible responses to emerging epidemiological threats for optimal control and reduction of harm. The framework is specifically designed to be both scalable and modular, allowing it to model a variety of population levels, viruses, testing methods and strategies--including pooled testing--and intervention strategies. In this paper, we provide an overview of this framework and examine the impact of different intervention strategies and their impact on infection dynamics.

agent, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.09852

Country:

North America > United States > Oklahoma > Payne County > Cushing (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Cuba (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)

Add feedback

Evolving Testing Scenario Generation Method and Intelligence Evaluation Framework for Automated Vehicles

Ma, Yining, Jiang, Wei, Zhang, Lingtong, Chen, Junyi, Wang, Hong, Lv, Chen, Wang, Xuesong, Xiong, Lu

arXiv.org Artificial IntelligenceJun-12-2023

Interaction between the background vehicles (BVs) and automated vehicles (AVs) in scenario-based testing plays a critical role in evaluating the intelligence of the AVs. Current testing scenarios typically employ predefined or scripted BVs, which inadequately reflect the complexity of human-like social behaviors in real-world driving scenarios, and also lack a systematic metric for evaluating the comprehensive intelligence of AVs. Therefore, this paper proposes an evolving scenario generation method that utilizes deep reinforcement learning (DRL) to create human-like BVs for testing and intelligence evaluation of AVs. Firstly, a class of driver models with human-like competitive, cooperative, and mutual driving motivations is designed. Then, utilizing an improved "level-k" training procedure, the three distinct driver models acquire game-based interactive driving policies. And these models are assigned to BVs for generating evolving scenarios in which all BVs can interact continuously and evolve diverse contents. Next, a framework including safety, driving efficiency, and interaction utility are presented to evaluate and quantify the intelligence performance of 3 systems under test (SUTs), indicating the effectiveness of the evolving scenario for intelligence testing. Finally, the complexity and fidelity of the proposed evolving testing scenario are validated. The results demonstrate that the proposed evolving scenario exhibits the highest level of complexity compared to other baseline scenarios and has more than 85% similarity to naturalistic driving data. This highlights the potential of the proposed method to facilitate the development and evaluation of high-level AVs in a realistic and challenging environment.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2306.07142

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Education (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback