attribution quality
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.40)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- Asia > Middle East > Jordan (0.04)
Benchmarking the Attribution Quality of Vision Models
Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.
On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines
Geiger, Alexander, Wagner, Lars, Rueckert, Daniel, Wilhelm, Dirk, Jell, Alissa
The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are critical for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline representing the absence of relevant features ("missingness"). Commonly used baselines, such as all-zero inputs, are often semantically meaningless, especially in medical contexts. While alternative baseline choices have been explored, existing methods lack a principled approach to dynamically select baselines tailored to each input. In this work, we examine the notion of missingness in the medical context, analyze its implications for baseline selection, and introduce a counterfactual-guided approach to address the limitations of conventional baselines. We argue that a generated counterfactual (i.e. clinically "normal" variation of the pathological input) represents a more accurate representation of a meaningful absence of features. We use a Variational Autoencoder in our implementation, though our concept is model-agnostic and can be applied with any suitable counterfactual method. We evaluate our concept on three distinct medical data sets and empirically demonstrate that counterfactual baselines yield more faithful and medically relevant attributions, outperforming standard baseline choices as well as other related methods.
- Europe > Switzerland (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (8 more...)
Attribution Quality in AI-Generated Content:Benchmarking Style Embeddings and LLM Judges
Attributing authorship in the era of large language models (LLMs) is increasingly challenging as machine-generated prose rivals human writing. We benchmark two complementary attribution mechanisms , fixed Style Embeddings and an instruction-tuned LLM judge (GPT-4o) on the Human AI Parallel Corpus, an open dataset of 600 balanced instances spanning six domains (academic, news, fiction, blogs, spoken transcripts, and TV/movie scripts). Each instance contains a human prompt with both a gold continuation and an LLM-generated continuation from either GPT-4o or LLaMA-70B-Instruct. The Style Embedding baseline achieves stronger aggregate accuracy on GPT continuations (82 pct vs. 68 pct). The LLM Judge is slightly better than the Style embeddings on LLaMA continuations (85 pct vs. 81 pct) but the results are not statistically significant. Crucially, the LLM judge significantly outperforms in fiction and academic prose, indicating semantic sensitivity, whereas embeddings dominate in spoken and scripted dialogue, reflecting structural strengths. These complementary patterns highlight attribution as a multidimensional problem requiring hybrid strategies. To support reproducibility we provide code on GitHub and derived data on Hugging Face under the MIT license. This open framework provides a reproducible benchmark for attribution quality assessment in AI-generated content, along with a review of related literature influencing this work.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.34)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
Systematic Evaluation of Attribution Methods: Eliminating Threshold Bias and Revealing Method-Dependent Performance Patterns
Attribution methods explain neural network predictions by identifying influential input features, but their evaluation suffers from threshold selection bias that can reverse method rankings and undermine conclusions. Current protocols binarize attribution maps at single thresholds, where threshold choice alone can alter rankings by over 200 percentage points. We address this flaw with a threshold-free framework that computes Area Under the Curve for Intersection over Union (AUC-IoU), capturing attribution quality across the full threshold spectrum. Evaluating seven attribution methods on dermatological imaging, we show single-threshold metrics yield contradictory results, while threshold-free evaluation provides reliable differentiation. XRAI achieves 31% improvement over LIME and 204% over vanilla Integrated Gradients, with size-stratified analysis revealing performance variations up to 269% across lesion scales. These findings establish methodological standards that eliminate evaluation artifacts and enable evidence-based method selection. The threshold-free framework provides both theoretical insight into attribution behavior and practical guidance for robust comparison in medical imaging and beyond.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers
Han, Sungmin, Lee, Jeonghyun, Lee, Sangkyun
Transformers have profoundly influenced AI research, but explaining their decisions remains challenging -- even for relatively simpler tasks such as classification -- which hinders trust and safe deployment in real-world applications. Although activation-based attribution methods effectively explain transformer-based text classification models, our findings reveal that these methods can be undermined by class-irrelevant features within activations, leading to less reliable interpretations. To address this limitation, we propose Contrast-CAT, a novel activation contrast-based attribution method that refines token-level attributions by filtering out class-irrelevant features. By contrasting the activations of an input sequence with reference activations, Contrast-CAT generates clearer and more faithful attribution maps. Experimental results across various datasets and models confirm that Contrast-CAT consistently outperforms state-of-the-art methods. Notably, under the MoRF setting, it achieves average improvements of x1.30 in AOPC and x2.25 in LOdds over the most competing methods, demonstrating its effectiveness in enhancing interpretability for transformer-based text classification.
- Europe (0.14)
- North America > United States > New York (0.04)
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation
Baer, Gregor, Grau, Isel, Zhang, Chao, Van Gorp, Pieter
Evaluating feature attribution methods represents a critical challenge in explainable AI (XAI), as researchers typically rely on perturbation-based metrics when ground truth is unavailable. However, recent work reveals that these evaluation metrics can show different performance across predicted classes within the same dataset. These "class-dependent evaluation effects" raise questions about whether perturbation analysis reliably measures attribution quality, with direct implications for XAI method development and evaluation trustworthiness. We investigate under which conditions these class-dependent effects arise by conducting controlled experiments with synthetic time series data where ground truth feature locations are known. We systematically vary feature types and class contrasts across binary classification tasks, then compare perturbation-based degradation scores with ground truth-based precision-recall metrics using multiple attribution methods. Our experiments demonstrate that class-dependent effects emerge with both evaluation approaches, even in simple scenarios with temporally localized features, triggered by basic variations in feature amplitude or temporal extent between classes. Most critically, we find that perturbation-based and ground truth metrics frequently yield contradictory assessments of attribution quality across classes, with weak correlations between evaluation approaches. These findings suggest that researchers should interpret perturbation-based metrics with care, as they may not always align with whether attributions correctly identify discriminating features. By showing this disconnect, our work points toward reconsidering what attribution evaluation actually measures and developing more rigorous evaluation methods that capture multiple dimensions of attribution quality.
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Europe > Switzerland (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.86)
GenerationPrograms: Fine-grained Attribution with Executable Programs
Wan, David, Hirsch, Eran, Stengel-Eskin, Elias, Dagan, Ido, Bansal, Mohit
Recent large language models (LLMs) achieve impressive performance in source-conditioned text generation but often fail to correctly provide fine-grained attributions for their outputs, undermining verifiability and trust. Moreover, existing attribution methods do not explain how and why models leverage the provided source documents to generate their final responses, limiting interpretability. To overcome these challenges, we introduce a modular generation framework, GenerationPrograms, inspired by recent advancements in executable "code agent" architectures. Unlike conventional generation methods that simultaneously generate outputs and attributions or rely on post-hoc attribution, GenerationPrograms decomposes the process into two distinct stages: first, creating an executable program plan composed of modular text operations (such as paraphrasing, compression, and fusion) explicitly tailored to the query, and second, executing these operations following the program's specified instructions to produce the final response. Empirical evaluations demonstrate that GenerationPrograms significantly improves attribution quality at both the document level and sentence level across two long-form question-answering tasks and a multi-document summarization task. We further demonstrate that GenerationPrograms can effectively function as a post-hoc attribution method, outperforming traditional techniques in recovering accurate attributions. In addition, the interpretable programs generated by GenerationPrograms enable localized refinement through modular-level improvements that further enhance overall attribution quality.
- North America > United States > Ohio (0.05)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (17 more...)
Benchmarking the Attribution Quality of Vision Models
Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality.