Vision Foundation Model Embedding-Based Semantic Anomaly Detection

Ronecker, Max Peter, Foutter, Matthew, Elhafsi, Amine, Gammelli, Daniele, Barakaiev, Ihor, Pavone, Marco, Watzenig, Daniel

arXiv.org Artificial Intelligence 

-- Semantic anomalies are contextually invalid or unusual combinations of familiar visual elements that can cause undefined behavior and failures in system-level reasoning for autonomous systems. This work explores semantic anomaly detection by leveraging the semantic priors of state-of-the-art vision foundation models, operating directly on the image. We propose a framework that compares local vision embeddings from runtime images to a database of nominal scenarios in which the autonomous system is deemed safe and performant. In this work, we consider two variants of the proposed framework: one using raw grid-based embeddings, and another leveraging instance segmentation for object-centric representations. T o further improve robustness, we introduce a simple filtering mechanism to suppress false positives. Our evaluations on CARLA-simulated anomalies show that the instance-based method with filtering achieves performance comparable to GPT -4o, while providing precise anomaly localization. I. INTRODUCTION Autonomous vehicles, such as Waymo [1] or Tesla [2], are increasingly deployed in real-world environments and rely heavily on machine learning (ML) algorithms, especially in perception modules, e.g., object detection. While these algorithms often perform reliably within their training distributions, ML models remain vulnerable to out-of-distribution (OOD) inputs, which can lead to unsafe or unpredictable behavior. An OOD input is data that significantly differs from the training distribution of an ML model and is defined relative to that model, such as unusual objects, rare weather conditions, or novel environments.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found