verification technique
Towards Responsible AI: Advances in Safety, Fairness, and Accountability of Autonomous Systems
Ensuring responsible use of artificial intelligence (AI) has become imperative as autonomous systems increasingly influence critical societal domains. However, the concept of trustworthy AI remains broad and multi-faceted. This thesis advances knowledge in the safety, fairness, transparency, and accountability of AI systems. In safety, we extend classical deterministic shielding techniques to become resilient against delayed observations, enabling practical deployment in real-world conditions. We also implement both deterministic and probabilistic safety shields into simulated autonomous vehicles to prevent collisions with road users, validating the use of these techniques in realistic driving simulators. We introduce fairness shields, a novel post-processing approach to enforce group fairness in sequential decision-making settings over finite and periodic time horizons. By optimizing intervention costs while strictly ensuring fairness constraints, this method efficiently balances fairness with minimal interference. For transparency and accountability, we propose a formal framework for assessing intentional behaviour in probabilistic decision-making agents, introducing quantitative metrics of agency and intention quotient. We use these metrics to propose a retrospective analysis of intention, useful for determining responsibility when autonomous systems cause unintended harm. Finally, we unify these contributions through the ``reactive decision-making'' framework, providing a general formalization that consolidates previous approaches. Collectively, the advancements presented contribute practically to the realization of safer, fairer, and more accountable AI systems, laying the foundations for future research in trustworthy AI.
- Europe > Austria > Styria > Graz (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (11 more...)
- Transportation > Ground > Road (1.00)
- Leisure & Entertainment > Games (1.00)
- Law (1.00)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Budget-aware Test-time Scaling via Discriminative Verification
Montgomery, Kyle, Tan, Sijun, Chen, Yuqi, Zhuang, Siyuan, Zhang, Tianjun, Popa, Raluca Ada, Wang, Chenguang
Test-time scaling is a powerful strategy for boosting the performance of large language models on complex reasoning tasks. While state-of-the-art approaches often employ generative verifiers to select the best solution from a pool of candidates, this method incurs prohibitive computational costs, limiting its practicality. In this work, we shift the focus to a more budget-aware paradigm: discriminative verification. We conduct a thorough empirical analysis and demonstrate that while discriminative verifiers may underperform in isolation, combining them with self-consistency in a hybrid approach creates a powerful and efficient test-time scaling mechanism. Notably, under a fixed compute budget, this hybrid approach surpasses state-of-the-art generative verification by a significant margin: achieving up to 15.3\% higher accuracy on AIME2025. Our findings establish that for practical, real-world applications, budget-aware scaling with discriminative verifiers is not only a "free" upgrade over self-consistency, but also a more effective and efficient alternative to costly generative techniques. Code is available at https://github.com/wang-research-lab/verification.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Iraq > Basra Governorate > Basra (0.04)
Solving Math Word Problems Using Estimation Verification and Equation Generation
Piehl, Mitchell, Wilson, Dillon, Kalita, Ananya, Kalita, Jugal
Large Language Models (LLMs) excel at various tasks, including problem-solving and question-answering. However, LLMs often find Math Word Problems (MWPs) challenging because solving them requires a range of reasoning and mathematical abilities with which LLMs seem to struggle. Recent efforts have helped LLMs solve more complex MWPs with improved prompts. This study proposes a novel method that initially prompts an LLM to create equations from a decomposition of the question, followed by using an external symbolic equation solver to produce an answer. To ensure the accuracy of the obtained answer, inspired by an established recommendation of math teachers, the LLM is instructed to solve the MWP a second time, but this time with the objective of estimating the correct answer instead of solving it exactly. The estimation is then compared to the generated answer to verify. If verification fails, an iterative rectification process is employed to ensure the correct answer is eventually found. This approach achieves new state-of-the-art results on datasets used by prior published research on numeric and algebraic MWPs, improving the previous best results by nearly two percent on average. In addition, the approach obtains satisfactory results on trigonometric MWPs, a task not previously attempted to the authors' best knowledge. This study also introduces two new datasets, SVAMPClean and Trig300, to further advance the testing of LLMs' reasoning abilities.
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Iowa > Johnson County > Iowa City (0.04)
- Research Report > Promising Solution (0.67)
- Research Report > New Finding (0.46)
- Education > Curriculum > Subject-Specific Education (0.54)
- Education > Educational Setting > K-12 Education (0.46)
Program Correctness through Self-Certification
Programming is both an enjoyable and a difficult task. A seemingly small slip can introduce a serious error or create a security vulnerability. The need for, and importance of, program correctness was recognized early in the modern development of computing. Algorithms, on which programs are built, arose in the ancient world (and are commonly attributed to the Greeks). The word verification dates only to Medieval Latin; however, when Euclid introduced his algorithm for the greatest common divisor centuries earlier, he provided a proof sketch based on what we would today call inductive reasoning.13
Open Challenges in the Formal Verification of Autonomous Driving
Burgio, Paolo, Ferrando, Angelo, Villani, Marco
In the realm of autonomous driving, the development and integration of highly complex and heterogeneous systems are standard practice. Modern vehicles are not monolithic systems; instead, they are composed of diverse hardware components, each running its own software systems. An autonomous vehicle comprises numerous independent components, often developed by different and potentially competing companies. This diversity poses significant challenges for the certification process, as it necessitates certifying components that may not disclose their internal behaviour (black-boxes). In this paper, we present a real-world case study of an autonomous driving system, identify key open challenges associated with its development and integration, and explore how formal verification techniques can address these challenges to ensure system reliability and safety.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (7 more...)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.95)
Efficient Verification of a RADAR SoC Using Formal and Simulation-Based Methods
Kumar, Aman, Litterick, Mark, Candido, Samuele
Abstract--As the demand for Internet of Things (IoT) and Human-to-Machine Interaction (HMI) increases, modern System-on-Chips (SoCs) offering such solutions are becoming increasingly complex. This intricate design poses significant challenges for verification, particularly when time-to-market is a crucial factor for consumer electronics products. This paper presents a case study based on our work to verify a complex Radio Detection And Ranging (RADAR) based SoC that performs on-chip sensing of human motion with millimetre accuracy [1]. We leverage both formal and simulation-based methods to complement each other and achieve verification sign-off with high confidence [2]. While employing a requirements-driven flow approach [3], we demonstrate the use of different verification methods to cater to multiple requirements and highlight our know-how from the project. Additionally, we used Machine Learning (ML) based methods, specifically the Xcelium ML tool from Cadence, to improve verification throughput [4]. Verification has become the bottleneck in product development cycles, as it takes more than 60 % of the overall project time [5]. Complex designs such as RADAR-based SoC contribute even more to the challenges in verification on top of existing ones.
- Europe > Germany > Saxony > Dresden (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Discrete-Event Controller Synthesis for Autonomous Systems with Deep-Learning Perception Components
Calinescu, Radu, Imrie, Calum, Mangal, Ravi, Rodrigues, Genaína Nunes, Păsăreanu, Corina, Santana, Misael Alpizar, Vázquez, Gricel
We present DeepDECS, a new method for the synthesis of correct-by-construction discrete-event controllers for autonomous systems that use deep neural network (DNN) classifiers for the perception step of their decision-making processes. Despite major advances in deep learning in recent years, providing safety guarantees for these systems remains very challenging. Our controller synthesis method addresses this challenge by integrating DNN verification with the synthesis of verified Markov models. The synthesised models correspond to discrete-event controllers guaranteed to satisfy the safety, dependability and performance requirements of the autonomous system, and to be Pareto optimal with respect to a set of optimisation objectives. We use the method in simulation to synthesise controllers for mobile-robot collision mitigation and for maintaining driver attentiveness in shared-control autonomous driving.
- South America > Brazil > Federal District > Brasília (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > North Yorkshire > York (0.04)
- Information Technology > Robotics & Automation (0.66)
- Transportation > Ground > Road (0.48)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases
Bluemke, Emma, Collins, Tantum, Garfinkel, Ben, Trask, Andrew
The development of privacy-enhancing technologies has made immense progress in reducing trade-offs between privacy and performance in data exchange and analysis. Similar tools for structured transparency could be useful for AI governance by offering capabilities such as external scrutiny, auditing, and source verification. It is useful to view these different AI governance objectives as a system of information flows in order to avoid partial solutions and significant gaps in governance, as there may be significant overlap in the software stacks needed for the AI governance use cases mentioned in this text. When viewing the system as a whole, the importance of interoperability between these different AI governance solutions becomes clear. Therefore, it is imminently important to look at these problems in AI governance as a system, before these standards, auditing procedures, software, and norms settle into place.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- Oceania > New Zealand (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)