Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models

Joshi, Aniruddha, Chakraborty, Supratik, Akshay, S, Shah, Shetal, Torfah, Hazem, Seshia, Sanjit

Aug-22-2025–arXiv.org Artificial Intelligence

Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.

artificial intelligence, interpretation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Aug-22-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.92)
- North America > United States
  - New York (0.28)
  - California (0.28)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine > Therapeutic Area (1.00)
- Transportation > Air (0.73)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (1.00)
  - Machine Learning (1.00)