Taming Silent Failures: A Framework for Verifiable AI Reliability

Oct-28-2025–arXiv.org Artificial Intelligence

Abstract--The integration of Artificial Intelligence (AI) into safety-critical systems introduces a new reliability paradigm: silent failures, where AI produces confident but incorrect outputs that can be dangerous. This paper introduces the Formal Assurance and Monitoring Environment (FAME), a novel framework that confronts this challenge. FAME synergizes the mathematical rigor of offline formal synthesis with the vigilance of online runtime monitoring to create a verifiable safety net around opaque AI components. We demonstrate its efficacy in an autonomous vehicle perception system, where FAME successfully detected 93.5% of critical safety violations that were otherwise silent. By contextualizing our framework within the ISO 26262 and ISO/P AS 8800 standards, we provide reliability engineers with a practical, certifiable pathway for deploying trustworthy AI. FAME represents a crucial shift from accepting probabilistic performance to enforcing provable safety in next-generation systems. From driver assistance to computer-aided diagnosis (CAD), data-driven components promise superhuman perception and decision support. Y et they also introduce a reliability problem that differs from classical, code-centric software engineering: silent failure, confident outputs that are wrong, with no explicit crash, exception, or error code exposed to the rest of the stack [1], [2]. Safety-critical traditional software is developed under rigorous processes (requirements traceability, design assurance, redundancy, and diagnostics) and can exhibit multiple failure modes (e.g., fail-silent, latent, Byzantine), which are analyzed and mitigated through established standards and verification activities. In contrast, the correctness of learning-enabled components depends on data distributions as much as on code, and can degrade under distribution shift, sensor faults, or occlusions without tripping conventional diagnostics [1]. Standard testing is insufficient, as the input space of production DNNs is hyper-dimensional and cannot be exhaustively exercised [3].

logic & formal reasoning, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.64)

Industry:
- Automobiles & Trucks (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Robots (0.89)
  - Issues > Social & Ethical Issues (0.66)
  - Representation & Reasoning > Logic & Formal Reasoning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found