AITopics

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence (1.00)

David Alvarez Melis, Tommi Jaakkola

Towards Robust Interpretability with Self-Explaining Neural Networks

Neural Information Processing SystemsMar-15-2026, 02:31:39 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, explanation, machine learning, (17 more...)

Country: North America (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-12-2026, 06:28:23 GMT

46bae562da84d63269673808e8eff777-Paper-Datasets_and_Benchmarks_Track.pdf

artificial intelligence, data mining, machine learning, (17 more...)

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Bristol (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsFeb-8-2026, 08:25:19 GMT

2 ProbabilisticLinearSolvers

Machine learning poses specific challenges for the solution of such systems due to their scale, characteristic structure, stochasticity and the central role of uncertainty in thefield.

artificial intelligence, calibration, machine learning, (17 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > Panama (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)
Information Technology > Artificial Intelligence > Machine Learning (0.36)

Neural Information Processing SystemsFeb-8-2026, 05:36:40 GMT

ExplainabilityViaCausalSelf-Talk

Asmodern machinelearning systems become morepowerfulandembedded inourlives,theneed to have these systems explain their behavior becomes increasingly urgent.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Alameda County > Livermore (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Neural Information Processing SystemsDec-24-2025, 00:12:21 GMT

Explainability Via Causal Self-Talk

Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond explanations, we also demonstrate that these learned models provide new ways of building semantic control interfaces to AI systems.

ai system, causal self-talk, name change, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Kumar, Shubham, Ahuja, Narendra

Measuring the (Un)Faithfulness of Concept-Based Explanations

arXiv.org Artificial IntelligenceNov-20-2025

Deep vision models perform input-output computations that are hard to interpret. Concept-based explanation methods (CBEMs) increase interpretability by re-expressing parts of the model with human-understandable semantic units, or concepts. Checking if the derived explanations are faithful -- that is, they represent the model's internal computation -- requires a surrogate that combines concepts to compute the output. Simplifications made for interpretability inevitably reduce faithfulness, resulting in a tradeoff between the two. State-of-the-art unsupervised CBEMs (U-CBEMs) have reported increasingly interpretable concepts, while also being more faithful to the model. However, we observe that the reported improvement in faithfulness artificially results from either (1) using overly complex surrogates, which introduces an unmeasured cost to the explanation's interpretability, or (2) relying on deletion-based approaches that, as we demonstrate, do not properly measure faithfulness. We propose Surrogate Faithfulness (SURF), which (1) replaces prior complex surrogates with a simple, linear surrogate that measures faithfulness without changing the explanation's interpretability and (2) introduces well-motivated metrics that assess loss across all output classes, not just the predicted class. We validate SURF with a measure-over-measure study by proposing a simple sanity check -- explanations with random concepts should be less faithful -- which prior surrogates fail. SURF enables the first reliable faithfulness benchmark of U-CBEMs, revealing that many visually compelling U-CBEMs are not faithful. Code to be released.

artificial intelligence, explanation, machine learning, (16 more...)

2504.10833

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Vision (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-3-2025

Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning

Tudor, Alexis R., Zeng, Yankai, Wang, Huaduo, Arias, Joaquin, Gupta, Gopal

Current advances in AI and its applicability have highlighted the need to ensure its trustworthiness for legal, ethical, and even commercial reasons. Sub-symbolic machine learning algorithms, such as the LLMs, simulate reasoning but hallucinate and their decisions cannot be explained or audited (crucial aspects for trustworthiness). On the other hand, rule-based reasoners, such as Cyc, are able to provide the chain of reasoning steps but are complex and use a large number of reasoners. We propose a middle ground using s(CASP), a goal-directed constraint-based answer set programming reasoner that employs a small number of mechanisms to emulate reliable and explainable human-style commonsense reasoning. In this paper, we explain how s(CASP) supports the 16 desiderata for trustworthy AI introduced by Doug Lenat and Gary Marcus (2023), and two additional ones: inconsistency detection and the assumption of alternative worlds. To illustrate the feasibility and synergies of s(CASP), we present a range of diverse applications, including a conversational chatbot and a virtually embodied reasoner.

casp, logic & formal reasoning, machine learning, (21 more...)

2506.12667

Country:

Europe (0.93)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceOct-31-2025

Plasticity as the Mirror of Empowerment

Abel, David, Bowling, Michael, Barreto, André, Dabney, Will, Dong, Shi, Hansen, Steven, Harutyunyan, Anna, Khetarpal, Khimya, Lyle, Clare, Pascanu, Razvan, Piliouras, Georgios, Precup, Doina, Richens, Jonathan, Rowland, Mark, Schaul, Tom, Singh, Satinder

Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Under this definition, we find that plasticity is well thought of as the mirror of empowerment: The two concepts are defined using the same measure, with only the direction of influence reversed. Our main result establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency

artificial intelligence, machine learning, plasticity, (16 more...)

2505.10361

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Amad, Harry, Qian, Zhaozhi, Frauen, Dennis, Piskorz, Julianna, Feuerriegel, Stefan, van der Schaar, Mihaela

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference

arXiv.org Artificial IntelligenceOct-22-2025

Causal inference is essential for developing and evaluating medical interventions, yet real-world medical datasets are often difficult to access due to regulatory barriers. This makes synthetic data a potentially valuable asset that enables these medical analyses, along with the development of new inference methods themselves. Generative models can produce synthetic data that closely approximate real data distributions, yet existing methods do not consider the unique challenges that downstream causal inference tasks, and specifically those focused on treatments, pose. We establish a set of desiderata that synthetic data containing treatments should satisfy to maximise downstream utility: preservation of (i) the covariate distribution, (ii) the treatment assignment mechanism, and (iii) the outcome generation mechanism. Based on these desiderata, we propose a set of evaluation metrics to assess such synthetic data. Finally, we present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine that mimics the data-generating process of data containing treatments and optimises for our desiderata. We empirically demonstrate that STEAM achieves state-of-the-art performance across our metrics as compared to existing generative models, particularly as the complexity of the true data-generating process increases.

artificial intelligence, machine learning, natural language, (18 more...)