Probabilistic Stability Guarantees for Feature Attributions

Jin, Helen, Xue, Anton, You, Weiqiu, Goel, Surbhi, Wong, Eric

Aug-8-2025–arXiv.org Artificial Intelligence

Stability guarantees have emerged as a principled way to evaluate feature attributions, but existing certification methods rely on heavily smoothed classifiers and often produce conservative guarantees. To address these limitations, we introduce soft stability and propose a simple, model-agnostic, sample-efficient stability certification algorithm (SCA) that yields non-trivial and interpretable guarantees for any attribution method. Moreover, we show that mild smoothing achieves a more favorable trade-off between accuracy and stability, avoiding the aggressive compromises made in prior certification methods. To explain this behavior, we use Boolean function analysis to derive a novel characterization of stability under smoothing. We evaluate SCA on vision and language tasks and demonstrate the effectiveness of soft stability in measuring the robustness of explanation methods.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Aug-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.92)

Genre:
- Overview (0.67)
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (1.00)
- Government
  - Military (0.67)
  - Regional Government > North America Government
    - United States Government (0.67)

Technology:
- Information Technology
  - Data Science > Data Mining (0.68)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning (0.89)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found