The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

Apr-25-2026, 00:30:04 GMT–Neural Information Processing Systems

Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways.

explanation, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Apr-25-2026, 00:30:04 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report
  - New Finding (0.93)
  - Experimental Study (0.67)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
TheOut-of-DistributionProbleminExplainability andSearchMethodsforFeatureImportance Explanations

Similar Docs Excel Report more

Title	Similarity	Source
None found