Goto

Collaborating Authors

 cam method


Metric-Guided Synthesis of Class Activation Mapping

Luque-Cerpa, Alejandro, Polgreen, Elizabeth, Rajan, Ajitha, Torfah, Hazem

arXiv.org Artificial Intelligence

Class activation mapping (CAM) is a widely adopted class of saliency methods used to explain the behavior of convolutional neural networks (CNNs). These methods generate heatmaps that highlight the parts of the input most relevant to the CNN output. Various CAM methods have been proposed, each distinguished by the expressions used to derive heatmaps. In general, users look for heatmaps with specific properties that reflect different aspects of CNN functionality. These may include similarity to ground truth, robustness, equivariance, and more. Although existing CAM methods implicitly encode some of these properties in their expressions, they do not allow for variability in heatmap generation following the user's intent or domain knowledge. In this paper, we address this limitation by introducing SyCAM, a metric-based approach for synthesizing CAM expressions. Given a predefined evaluation metric for saliency maps, SyCAM automatically generates CAM expressions optimized for that metric. We specifically explore a syntax-guided synthesis instantiation of SyCAM, where CAM expressions are derived based on predefined syntactic constraints and the given metric. Using several established evaluation metrics, we demonstrate the efficacy and flexibility of our approach in generating targeted heatmaps. We compare SyCAM with other well-known CAM methods on three prominent models: ResNet50, VGG16, and VGG19.


Assessing the Noise Robustness of Class Activation Maps: A Framework for Reliable Model Interpretability

Sarkar, Syamantak, Bora, Revoti P., Kaushal, Bhupender, George, Sudhish N, Raja, Kiran

arXiv.org Artificial Intelligence

Class Activation Maps (CAMs) are one of the important methods for visualizing regions used by deep learning models. Yet their robustness to different noise remains underexplored. In this work, we evaluate and report the resilience of various CAM methods for different noise perturbations across multiple architectures and datasets. By analyzing the influence of different noise types on CAM explanations, we assess the susceptibility to noise and the extent to which dataset characteristics may impact explanation stability. The findings highlight considerable variability in noise sensitivity for various CAMs. We propose a robustness metric for CAMs that captures two key properties: consistency and responsiveness. Consistency reflects the ability of CAMs to remain stable under input perturbations that do not alter the predicted class, while responsiveness measures the sensitivity of CAMs to changes in the prediction caused by such perturbations. The metric is evaluated empirically across models, different perturbations, and datasets along with complementary statistical tests to exemplify the applicability of our proposed approach.


MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks

Iyer, Ram S, Iyer, Narayan S, P, Rugmini Ammal

arXiv.org Artificial Intelligence

With the intervention of machine vision in our crucial day to day necessities including healthcare and automated power plants, attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network provides specific inferences. This paper proposes a novel post-hoc visual explanation method called MI CAM based on activation mapping. Differing from previous class activation mapping based approaches, MI CAM produces saliency visualizations by weighing each feature map through its mutual information with the input image and the final result is generated by a linear combination of weights and activation maps. It also adheres to producing causal interpretations as validated with the help of counterfactual analysis. We aim to exhibit the visual performance and unbiased justifications for the model inferencing procedure achieved by MI CAM. Our approach works at par with all state-of-the-art methods but particularly outperforms some in terms of qualitative and quantitative measures. The implementation of proposed method can be found on https://anonymous.4open.science/r/MI-CAM-4D27


Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation

Zhang, Ziheng, Gu, Jianyang, Chowdhury, Arpita, Mai, Zheda, Carlyn, David, Berger-Wolf, Tanya, Su, Yu, Chao, Wei-Lun

arXiv.org Artificial Intelligence

Class activation map (CAM) has been widely used to highlight image regions that contribute to class predictions. Despite its simplicity and computational efficiency, CAM often struggles to identify discriminative regions that distinguish visually similar fine-grained classes. Prior efforts address this limitation by introducing more sophisticated explanation processes, but at the cost of extra complexity. In this paper, we propose Finer-CAM, a method that retains CAM's efficiency while achieving precise localization of discriminative regions. Our key insight is that the deficiency of CAM lies not in "how" it explains, but in "what" it explains}. Specifically, previous methods attempt to identify all cues contributing to the target class's logit value, which inadvertently also activates regions predictive of visually similar classes. By explicitly comparing the target class with similar classes and spotting their differences, Finer-CAM suppresses features shared with other classes and emphasizes the unique, discriminative details of the target class. Finer-CAM is easy to implement, compatible with various CAM methods, and can be extended to multi-modal models for accurate localization of specific concepts. Additionally, Finer-CAM allows adjustable comparison strength, enabling users to selectively highlight coarse object contours or fine discriminative details. Quantitatively, we show that masking out the top 5% of activated pixels by Finer-CAM results in a larger relative confidence drop compared to baselines. The source code and demo are available at https://github.com/Imageomics/Finer-CAM.


Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Zhang, Jiacheng, Liu, Feng, Zhou, Dawei, Zhang, Jingfeng, Liu, Tongliang

arXiv.org Artificial Intelligence

Adversarial training (AT) trains models using adversarial examples (AEs), which are natural images modified with specific perturbations to mislead the model. These perturbations are constrained by a predefined perturbation budget $\epsilon$ and are equally applied to each pixel within an image. However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i.e., robustness) and accuracy on natural images (i.e., accuracy). Motivated by this finding, we propose Pixel-reweighted AdveRsarial Training (PART), a new framework that partially reduces $\epsilon$ for less influential pixels, guiding the model to focus more on key regions that affect its outputs. Specifically, we first use class activation mapping (CAM) methods to identify important pixel regions, then we keep the perturbation budget for these regions while lowering it for the remaining regions when generating AEs. In the end, we use these pixel-reweighted AEs to train a model. PART achieves a notable improvement in accuracy without compromising robustness on CIFAR-10, SVHN and TinyImagenet-200, justifying the necessity to allocate distinct weights to different pixel regions in robust classification.


AME-CAM: Attentive Multiple-Exit CAM for Weakly Supervised Segmentation on MRI Brain Tumor

Chen, Yu-Jen, Hu, Xinrong, Shi, Yiyu, Ho, Tsung-Yi

arXiv.org Artificial Intelligence

Magnetic resonance imaging (MRI) is commonly used for brain tumor segmentation, which is critical for patient evaluation and treatment planning. To reduce the labor and expertise required for labeling, weakly-supervised semantic segmentation (WSSS) methods with class activation mapping (CAM) have been proposed. However, existing CAM methods suffer from low resolution due to strided convolution and pooling layers, resulting in inaccurate predictions. In this study, we propose a novel CAM method, Attentive Multiple-Exit CAM (AME-CAM), that extracts activation maps from multiple resolutions to hierarchically aggregate and improve prediction accuracy. We evaluate our method on the BraTS 2021 dataset and show that it outperforms state-of-the-art methods.


MetaCAM: Ensemble-Based Class Activation Map

Kaczmarek, Emily, Miguel, Olivier X., Bowie, Alexa C., Ducharme, Robin, Dingwall-Harvey, Alysha L. J., Hawken, Steven, Armour, Christine M., Walker, Mark C., Dick, Kevin

arXiv.org Artificial Intelligence

The need for clear, trustworthy explanations of deep learning model predictions is essential for high-criticality fields, such as medicine and biometric identification. Class Activation Maps (CAMs) are an increasingly popular category of visual explanation methods for Convolutional Neural Networks (CNNs). However, the performance of individual CAMs depends largely on experimental parameters such as the selected image, target class, and model. Here, we propose MetaCAM, an ensemble-based method for combining multiple existing CAM methods based on the consensus of the top-k% most highly activated pixels across component CAMs. We perform experiments to quantifiably determine the optimal combination of 11 CAMs for a given MetaCAM experiment. A new method denoted Cumulative Residual Effect (CRE) is proposed to summarize large-scale ensemble-based experiments. We also present adaptive thresholding and demonstrate how it can be applied to individual CAMs to improve their performance, measured using pixel perturbation method Remove and Debias (ROAD). Lastly, we show that MetaCAM outperforms existing CAMs and refines the most salient regions of images used for model predictions. In a specific example, MetaCAM improved ROAD performance to 0.393 compared to 11 individual CAMs with ranges from -0.101-0.172, demonstrating the importance of combining CAMs through an ensembling method and adaptive thresholding.


Sensing accident-prone features in urban scenes for proactive driving and accident prevention

Mishra, Sumit, Rajendran, Praveen Kumar, Vecchietti, Luiz Felipe, Har, Dongsoo

arXiv.org Artificial Intelligence

In urban cities, visual information on and along roadways is likely to distract drivers and lead to missing traffic signs and other accident-prone (AP) features. To avoid accidents due to missing these visual cues, this paper proposes a visual notification of AP-features to drivers based on real-time images obtained via dashcam. For this purpose, Google Street View images around accident hotspots (areas of dense accident occurrence) identified by a real-accident dataset are used to train a novel attention module to classify a given urban scene into an accident hotspot or a non-hotspot (area of sparse accident occurrence). The proposed module leverages channel, point, and spatial-wise attention learning on top of different CNN backbones. This leads to better classification results and more certain AP-features with better contextual knowledge when compared with CNN backbones alone. Our proposed module achieves up to 92% classification accuracy. The capability of detecting AP-features by the proposed model were analyzed by a comparative study of three different class activation map (CAM) methods, which were used to inspect specific AP-features causing the classification decision. Outputs of CAM methods were processed by an image processing pipeline to extract only the AP-features that are explainable to drivers and notified using a visual notification system. Range of experiments was performed to prove the efficacy and AP-features of the system. Ablation of the AP-features taking 9.61%, on average, of the total area in each image increased the chance of a given area to be classified as a non-hotspot by up to 21.8%.


Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

Fu, Ruigang, Hu, Qingyong, Dong, Xiaohu, Guo, Yulan, Gao, Yinghui, Li, Biao

arXiv.org Artificial Intelligence

To have a better understanding and usage of Convolution Neural Networks (CNNs), the visualization and interpretation of CNNs has attracted increasing attention in recent years. In particular, several Class Activation Mapping (CAM) methods have been proposed to discover the connection between CNN's decision and image regions. In spite of the reasonable visualization, lack of clear and sufficient theoretical support is the main limitation of these methods. In this paper, we introduce two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods. Meanwhile, a dedicated Axiom-based Grad-CAM (XGrad-CAM) is proposed to satisfy these axioms as much as possible. Experiments demonstrate that XGrad-CAM is an enhanced version of Grad-CAM in terms of conservation and sensitivity. It is able to achieve better visualization performance than Grad-CAM, while also be class-discriminative and easy-to-implement compared with Grad-CAM++ and Ablation-CAM. The code is available at https://github.com/Fu0511/XGrad-CAM.