saliency
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > United States (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (4 more...)
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
KV cache stores key and value states from previous tokens to avoid re-computation, yet it demands substantial storage space, especially for long sequences. Adaptive KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.
Why Did This Model Forecast This Future? Information-Theoretic Saliency for Counterfactual Explanations of Probabilistic Regression Models
We propose a post hoc saliency-based explanation framework for counterfactual reasoning in probabilistic multivariate time-series forecasting (regression) settings. Building upon Miller's framework of explanations derived from research in multiple social science disciplines, we establish a conceptual link between counterfactual reasoning and saliency-based explanation techniques. To address the lack of a principled notion of saliency, we leverage a unifying definition of information-theoretic saliency grounded in preattentive human visual cognition and extend it to forecasting settings. Specifically, we obtain a closed-form expression for commonly used density functions to identify which observed timesteps appear salient to an underlying model in making its probabilistic forecasts. We empirically validate our framework in a principled manner using synthetic data to establish ground-truth saliency that is unavailable for real-world data. Finally, using real-world data and forecasting models, we demonstrate how our framework can assist domain experts in forming new data-driven hypotheses about the causal relationships between features in the wild.
Improving Deep Learning Interpretability by Saliency Guided Training
Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in unfaithful feature attributions. In this paper, we tackle this issue and introduce a {\it saliency guided training} procedure for neural networks to reduce noisy gradients used in predictions while retaining the predictive performance of the model. Our saliency guided training procedure iteratively masks features with small and potentially noisy gradients while maximizing the similarity of model outputs for both masked and unmasked inputs. We apply the saliency guided training procedure to various synthetic and real data sets from computer vision, natural language processing, and time series across diverse neural architectures, including Recurrent Neural Networks, Convolutional Networks, and Transformers. Through qualitative and quantitative evaluations, we show that saliency guided training procedure significantly improves model interpretability across various domains while preserving its predictive performance.
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides 15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models.
Video Token Merging for Long Video Understanding
As the scale of data and models for video understanding rapidly expand, handling long-form video input in transformer-based models presents a practical challenge. Rather than resorting to input sampling or token dropping, which may result in information loss, token merging shows promising results when used in collaboration with transformers. However, the application of token merging for long-form video processing is not trivial. We begin with the premise that token merging should not rely solely on the similarity of video tokens; the saliency of tokens should also be considered. To address this, we explore various video token merging strategies for long-form video classification, starting with a simple extension of image token merging, moving to region-concentrated merging, and finally proposing a learnable video token merging (VTM) algorithm that dynamically merges tokens based on their saliency. Extensive experimental results show that we achieve better or comparable performances on the LVU, COIN, and Breakfast datasets. Moreover, our approach significantly reduces memory costs by 84% and boosts throughput by approximately 6.89 times compared to baseline algorithms.
Saliency Guided Longitudinal Medical Visual Question Answering
Longitudinal medical visual question answering (Diff-VQA) requires comparing paired studies from different time points and answering questions about clinically meaningful changes. In this setting, the difference signal and the consistency of visual focus across time are more informative than absolute single-image findings. We propose a saliency-guided encoder-decoder for chest X-ray Diff-VQA that turns post-hoc saliency into actionable supervision. The model first performs a lightweight near-identity affine pre-alignment to reduce nuisance motion between visits. It then executes a within-epoch two-step loop: step 1 extracts a medically relevant keyword from the answer and generates keyword-conditioned Grad-CAM on both images to obtain disease-focused saliency; step 2 applies the shared saliency mask to both time points and generates the final answer. This closes the language-vision loop so that the terms that matter also guide where the model looks, enforcing spatially consistent attention on corresponding anatomy. On Medical-Diff-VQA, the approach attains competitive performance on BLEU, ROUGE-L, CIDEr, and METEOR while providing intrinsic interpretability. Notably, the backbone and decoder are general-domain pretrained without radiology-specific pretraining, highlighting practicality and transferability. These results support saliency-conditioned generation with mild pre-alignment as a principled framework for longitudinal reasoning in medical VQA.
- North America > United States > California > San Diego County > San Diego (0.05)
- Europe > Switzerland (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (3 more...)
- Workflow (0.69)
- Research Report (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation
Yang, Zhou, Luo, Shunyan, Zhu, Jiazhen, Jin, Fang
Abstract--Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decision-making processes. Traditional methods often rely on post-hoc interpretations, such as saliency maps or feature visualization, which might not be directly applicable to the discrete nature of word data in NLP . Addressing this, we introduce the Model-agnostic Saliency Estimation (MASE) framework. MASE offers local explanations for text-based predictive models without necessitating in-depth knowledge of a model's internal architecture. By leveraging Normalized Linear Gaussian Perturbations (NLGP) on the embedding layer instead of raw word inputs, MASE efficiently estimates input saliency. Our results indicate MASE's superiority over other model-agnostic interpretation methods, especially in terms of Delta Accuracy, positioning it as a promising tool for elucidating the operations of text-based models in NLP . Deep learning models are becoming increasingly prevalent in Natural Language Processing (NLP) systems. V arious efficient deep neural networks (DNNs) have been proposed, ranging from classic models such as the Recurrent Neural Network (RNN) [1] and Long-Short Term Memory (LSTM) [2] to more recent Bidirectional Encoder Representations from Transformers (BERT) [3]. Although these DNNs have shown great success in natural language processing tasks, their lack of explain-ability and/or interpretability remains a major weakness when comparing and deploying these black-box models for NLP tasks such as text classification and resume recommendation. Moreover, as NLP models become increasingly complex with millions or even billions of parameters, explaining the precise decision-making process of a DNN is becoming increasingly challenging.
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry
S, Amirtha Varshini A, Ranasinghe, Duminda S., Tam, Hok Hei
Generative Flow Networks, or GFlowNets, offer a promising framework for molecular design, but their internal decision policies remain opaque. This limits adoption in drug discovery, where chemists require clear and interpretable rationales for proposed structures. We present an interpretability framework for SynFlowNet, a GFlowNet trained on documented chemical reactions and purchasable starting materials that generates both molecules and the synthetic routes that produce them. Our approach integrates three complementary components. Gradient based saliency combined with counterfactual perturbations identifies which atomic environments influence reward and how structural edits change molecular outcomes. Sparse autoencoders reveal axis aligned latent factors that correspond to physicochemical properties such as polarity, lipophilicity, and molecular size. Motif probes show that functional groups including aromatic rings and halogens are explicitly encoded and linearly decodable from the internal embeddings. Together, these results expose the chemical logic inside SynFlowNet and provide actionable and mechanistic insight that supports transparent and controllable molecular design.
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.47)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.37)