mvtec-ad
Appendix of " Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection "
The hyperparameters β and α are set as 0.5 and 0.01 for each layer. CIF AR-10: The image size is set to 224 x 224, and the feature size is 14 x 14. The encoder and decoder layers were both set to 4. The hyperparameters β and α are set to 0.5 and 0.01 for each layer. ELBO of our variational autoencoder should include both a reconstruction likelihood and a KL term. Lower Bound (ELBO) is constant, w.r.t. the KL divergence can thus be ignored for training.
Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
Zhai, Guangyao, Zhou, Yue, Deng, Xinyan, Heckler, Lars, Navab, Nassir, Busam, Benjamin
Few-shot anomaly detection streamlines and simplifies industrial safety inspection. However, limited samples make accurate differentiation between normal and abnormal features challenging, and even more so under category-agnostic conditions. Large-scale pre-training of foundation visual encoders has advanced many fields, as the enormous quantity of data helps to learn the general distribution of normal images. We observe that the anomaly amount in an image directly correlates with the difference in the learnt embeddings and utilize this to design a few-shot anomaly detector termed FoundAD. This is done by learning a nonlinear projection operator onto the natural image manifold. The simple operator acts as an effective tool for anomaly detection to characterize and identify out-of-distribution regions in an image. Extensive experiments show that our approach supports multi-class detection and achieves competitive performance while using substantially fewer parameters than prior methods. Backed up by evaluations with multiple foundation encoders, including fresh DINOv3, we believe this idea broadens the perspective on foundation features and advances the field of few-shot anomaly detection.
A Survey on Diffusion Models for Anomaly Detection
Liu, Jing, Ma, Zhenchao, Wang, Zepu, Zou, Chenxuanyin, Ren, Jiayang, Wang, Zehua, Song, Liang, Hu, Bo, Liu, Yang, Leung, Victor C. M.
Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly complex and high-dimensional data. In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis of classic DM architectures including DDPMs, DDIMs, and Score SDEs. We further categorize existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, providing detailed examinations of their methodological innovations. We also explore the diverse tasks across different data modalities, encompassing image, time series, video, and multimodal data analysis. Furthermore, we discuss critical challenges and emerging research directions, including computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. The collection of DMAD research papers and resources is available at https://github.com/fdjingliu/DMAD.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Overview (1.00)
- Research Report > Promising Solution (0.88)
- Health & Medicine (0.67)
- Information Technology > Security & Privacy (0.48)
- Government > Military (0.34)
- Law Enforcement & Public Safety > Fraud (0.34)
[Reproducibility Report] Explainable Deep One-Class Classification
Bertoldo, Joao P. C., Decencière, Etienne
Scope of Reproducibility Liznerski et al. [23] proposed Fully Convolutional Data Description (FCDD), an explainable version of the Hypersphere Classifier (HSC) to directly address image anomaly detection (AD) and pixel-wise AD without any post-hoc explainer methods. The authors claim that FCDD achieves results comparable with the state-of-the-art in sample-wise AD on Fashion-MNIST and CIFAR-10 and exceeds the state-of-the-art on the pixel-wise task on MVTec-AD. They also give evidence to show a clear improvement by using few (1 up to 8) real anomalous images in MVTec-AD for supervision at the pixel level. Finally, a qualitative study with horse images on PASCAL-VOC shows that FCDD can intrinsically reveal spurious model decisions by providing built-in anomaly score heatmaps. Methodology We have reproduced the quantitative results in the main text of [23] except for the performance on ImageNet: samplewise AD on Fashion-MNIST and CIFAR-10, and pixel-wise AD on MVTec-AD.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > France (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- (2 more...)
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
Jeong, Jongheon, Zou, Yang, Kim, Taewan, Zhang, Dongqing, Ravichandran, Avinash, Dabeer, Onkar
Visual anomaly classification and segmentation are vital for automating industrial quality inspection. The focus of prior research in the field has been on training custom models for each quality inspection task, which requires task-specific images and annotation. In this paper we move away from this regime, addressing zero-shot and few-normal-shot anomaly classification and segmentation. Recently CLIP, a vision-language model, has shown revolutionary generality with competitive zero-/few-shot performance in comparison to full-supervision. But CLIP falls short on anomaly classification and segmentation tasks. Hence, we propose window-based CLIP (WinCLIP) with (1) a compositional ensemble on state words and prompt templates and (2) efficient extraction and aggregation of window/patch/image-level features aligned with text. We also propose its few-normal-shot extension WinCLIP+, which uses complementary information from normal images. In MVTec-AD (and VisA), without further tuning, WinCLIP achieves 91.8%/85.1% (78.1%/79.6%) AUROC in zero-shot anomaly classification and segmentation while WinCLIP+ does 93.1%/95.2% (83.8%/96.4%) in 1-normal-shot, surpassing state-of-the-art by large margins.
Explainable Deep One-Class Classification
Liznerski, Philipp, Ruff, Lukas, Vandermeulen, Robert A., Franks, Billy Joe, Kloft, Marius, Müller, Klaus-Robert
Deep one-class classification variants for anomaly detection learn a mapping that concentrates nominal samples in feature space causing anomalies to be mapped away. Because this transformation is highly non-linear, finding interpretations poses a significant challenge. In this paper we present an explainable deep one-class classification method, Fully Convolutional Data Description (FCDD), where the mapped samples are themselves also an explanation heatmap. FCDD yields competitive detection performance and provides reasonable explanations on common anomaly detection benchmarks with CIFAR-10 and ImageNet. On MVTec-AD, a recent manufacturing dataset offering ground-truth anomaly maps, FCDD sets a new state of the art in the unsupervised setting. Our method can incorporate ground-truth anomaly maps during training and using even a few of these (~5) improves performance significantly. Finally, using FCDD's explanations we demonstrate the vulnerability of deep one-class classification models to spurious image features such as image watermarks.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Europe > Germany > Berlin (0.04)
- Information Technology > Sensing and Signal Processing (1.00)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)