Model-agnostic and Scalable Counterfactual Explanations via Reinforcement Learning
Samoilescu, Robert-Florian, Van Looveren, Arnaud, Klaise, Janis
Counterfactual instances are a powerful tool to obtain valuable insights into automated decision processes, describing the necessary minimal changes in the input space to alter the prediction towards a desired target. Most previous approaches require a separate, computationally expensive optimization procedure per instance, making them impractical for both large amounts of data and high-dimensional data. Moreover, these methods are often restricted to certain subclasses of machine learning models (e.g. differentiable or tree-based models). In this work, we propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process, allowing us to generate batches of counterfactual instances in a single forward pass. Our experiments on real-world data show that our method i) is model-agnostic (does not assume differentiability), relying only on feedback from model predictions; ii) allows for generating target-conditional counterfactual instances; iii) allows for flexible feature range constraints for numerical and categorical attributes, including the immutability of protected features (e.g. gender, race); iv) is easily extended to other data modalities such as images.
Jun-4-2021
- Country:
- South America (0.04)
- North America
- Central America (0.04)
- United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Plymouth County
- Hanover (0.04)
- California > San Francisco County
- San Francisco (0.14)
- New York > New York County
- Puerto Rico > San Juan
- San Juan (0.04)
- Europe > United Kingdom
- England > Greater London > London (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine > Therapeutic Area (0.31)
- Technology: