Taming Silent Failures: A Framework for Verifiable AI Reliability
–arXiv.org Artificial Intelligence
Abstract--The integration of Artificial Intelligence (AI) into safety-critical systems introduces a new reliability paradigm: silent failures, where AI produces confident but incorrect outputs that can be dangerous. This paper introduces the Formal Assurance and Monitoring Environment (FAME), a novel framework that confronts this challenge. FAME synergizes the mathematical rigor of offline formal synthesis with the vigilance of online runtime monitoring to create a verifiable safety net around opaque AI components. We demonstrate its efficacy in an autonomous vehicle perception system, where FAME successfully detected 93.5% of critical safety violations that were otherwise silent. By contextualizing our framework within the ISO 26262 and ISO/P AS 8800 standards, we provide reliability engineers with a practical, certifiable pathway for deploying trustworthy AI. FAME represents a crucial shift from accepting probabilistic performance to enforcing provable safety in next-generation systems. From driver assistance to computer-aided diagnosis (CAD), data-driven components promise superhuman perception and decision support. Y et they also introduce a reliability problem that differs from classical, code-centric software engineering: silent failure, confident outputs that are wrong, with no explicit crash, exception, or error code exposed to the rest of the stack [1], [2]. Safety-critical traditional software is developed under rigorous processes (requirements traceability, design assurance, redundancy, and diagnostics) and can exhibit multiple failure modes (e.g., fail-silent, latent, Byzantine), which are analyzed and mitigated through established standards and verification activities. In contrast, the correctness of learning-enabled components depends on data distributions as much as on code, and can degrade under distribution shift, sensor faults, or occlusions without tripping conventional diagnostics [1]. Standard testing is insufficient, as the input space of production DNNs is hyper-dimensional and cannot be exhaustively exercised [3].
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- North America > United States > California (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Automobiles & Trucks (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.66)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Logic & Formal Reasoning (0.48)
- Robots (0.89)
- Information Technology > Artificial Intelligence