The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
–Neural Information Processing Systems
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways.
Neural Information Processing Systems
Apr-25-2026, 00:30:04 GMT
- Country:
- North America > United States > Minnesota (0.28)
- Genre:
- Research Report
- New Finding (0.93)
- Experimental Study (0.67)
- Research Report
- Industry:
- Education (0.46)
- Technology: