Goto

Collaborating Authors

 attribution


GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection---Supplementary Material- -- A Extensive Experiments A.1 Computational Efficiency of GAIA Methods

Neural Information Processing Systems

In Tab. 1, we conduct the test on a Tesla V100 to In Tab. 2, we train five ResNet34 models for the CIFAR benchmarks (CIFAR10 and CIFAR100), The blocks, labeled as block1 to block5, correspond to the output features obtained from shallow to deep. This can be expained as the model's In Section 4.1, we introduce channel-wise average abnormality under the assumption that Gradient-based Class Activation Mapping (GradCAM) can be regarded as having only first-order independent Here we provide a proof (from [18]) for this assumption. Then based on Eq. 2, we The issue of attribution can be viewed as the assignment of credit in cooperative game theory. Null Player Axiom: If removal of a feature across all potential coalitions with other features has no impact on the output, it should be assigned zero importance. In Section 4.2, we introduce the two-stage fusion strategy for GAIA-A and in Section 5.3, we briefly Eq. 8, the effect of output component is similar to the The extensive results are shown in Tab. 3. It indicates the effectiveness of our fusion strategy.







Attribution Preservation in Network Compression for Reliable Network Interpretation

Neural Information Processing Systems

Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its pre-compression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods.


ECSEL: Explainable Classification via Signomial Equation Learning

Lumadjeng, Adia, Birbil, Ilker, Acar, Erman

arXiv.org Machine Learning

We introduce ECSEL, an explainable classification method that learns formal expressions in the form of signomial equations, motivated by the observation that many symbolic regression benchmarks admit compact signomial structure. ECSEL directly constructs a structural, closed-form expression that serves as both a classifier and an explanation. On standard symbolic regression benchmarks, our method recovers a larger fraction of target equations than competing state-of-the-art approaches while requiring substantially less computation. Leveraging this efficiency, ECSEL achieves classification accuracy competitive with established machine learning models without sacrificing interpretability. Further, we show that ECSEL satisfies some desirable properties regarding global feature behavior, decision-boundary analysis, and local feature attributions. Experiments on benchmark datasets and two real-world case studies i.e., e-commerce and fraud detection, demonstrate that the learned equations expose dataset biases, support counterfactual reasoning, and yield actionable insights.



Towards Neuron Attributions in Multi-Modal Large Language Models

Neural Information Processing Systems

As Large Language Models (LLMs) demonstrate impressive capabilities, demystifying their internal mechanisms becomes increasingly vital. Neuron attribution, which attributes LLM outputs to specific neurons to reveal the semantic properties they learn, has emerged as a key interpretability approach. However, while neuron attribution has made significant progress in deciphering text-only LLMs, its application to Multimodal LLMs (MLLMs) remains less explored. To address this gap, we propose a novel Neuron Attribution method tailored for MLLMs, termed NAM. Specifically, NAM not only reveals the modality-specific semantic knowledge learned by neurons within MLLMs, but also highlights several intriguing properties of neurons, such as cross-modal invariance and semantic sensitivity. These properties collectively elucidate the inner workings mechanism of MLLMs, providing a deeper understanding of how MLLMs process and generate multi-modal content. Through theoretical analysis and empirical validation, we demonstrate the efficacy of NAM and the valuable insights it offers. Furthermore, leveraging NAM, we introduce a multi-modal knowledge editing paradigm, underscoring the practical significance of our approach for downstream applications of MLLMs.