model behaviour
- Banking & Finance (0.71)
- Media > Film (0.48)
- Leisure & Entertainment (0.48)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- Banking & Finance (0.71)
- Media > Film (0.48)
- Leisure & Entertainment (0.48)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language
Summerfield, Christopher, Luettgau, Lennart, Dubois, Magda, Kirk, Hannah Rose, Hackenburg, Kobi, Fist, Catherine, Slama, Katarina, Ding, Nicola, Anselmetti, Rebecca, Strait, Andrew, Giulianelli, Mario, Ududec, Cozmin
We examine recent research that asks whether current AI systems may be developing a capacity for "scheming" (covertly and strategically pursuing misaligned goals). We compare current research practices in this field to those adopted in the 1970s to test whether non-human primates could master natural language. We argue that there are lessons to be learned from that historical research endeavour, which was characterised by an overattribution of human traits to other agents, an excessive reliance on anecdote and descriptive analysis, and a failure to articulate a strong theoretical framework for the research. We recommend that research into AI scheming actively seeks to avoid these pitfalls. We outline some concrete steps that can be taken for this research programme to advance in a productive and scientifically rigorous fashion.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
Aggregating Local Saliency Maps for Semi-Global Explainable Image Classification
Deep learning dominates image classification tasks, yet understanding how models arrive at predictions remains a challenge. Much research focuses on local explanations of individual predictions, such as saliency maps, which visualise the influence of specific pixels on a model's prediction. However, reviewing many of these explanations to identify recurring patterns is infeasible, while global methods often oversimplify and miss important local behaviours. To address this, we propose Segment Attribution Tables (SA Ts), a method for summarising local saliency explanations into (semi-)global insights. SA Ts take image segments (such as "eyes" in Chihuahuas) and leverage saliency maps to quantify their influence. These segments highlight concepts the model relies on across instances and reveal spurious correlations, such as reliance on backgrounds or watermarks, even when out-of-distribution test performance sees little change. SA Ts can explain any classifier for which a form of saliency map can be produced, using segmentation maps that provide named segments. SA Ts bridge the gap between oversimplified global summaries and overly detailed local explanations, offering a practical tool for analysing and debugging image classifiers.
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
- Health & Medicine > Therapeutic Area > Dermatology (0.46)
In Defence of Post-hoc Explainability
The widespread adoption of machine learning in scientific research has created a fundamental tension between model opacity and scientific understanding. Whilst some advocate for intrinsically interpretable models, we introduce Computational Interpretabilism (CI) as a philosophical framework for post-hoc interpretability in scientific AI. Drawing parallels with human expertise, where post-hoc rationalisation coexists with reliable performance, CI establishes that scientific knowledge emerges through structured model interpretation when properly bounded by empirical validation. Through mediated understanding and bounded factivity, we demonstrate how post-hoc methods achieve epistemically justified insights without requiring complete mechanical transparency, resolving tensions between model complexity and scientific comprehension.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Law (0.68)
- Health & Medicine (0.67)
Measuring Cross-Modal Interactions in Multimodal Models
Wenderoth, Laura, Hemker, Konstantin, Simidjievski, Nikola, Jamnik, Mateja
Integrating AI in healthcare can greatly improve patient care and system efficiency. However, the lack of explainability in AI systems (XAI) hinders their clinical adoption, especially in multimodal settings that use increasingly complex model architectures. Most existing XAI methods focus on unimodal models, which fail to capture cross-modal interactions crucial for understanding the combined impact of multiple data sources. Existing methods for quantifying cross-modal interactions are limited to two modalities, rely on labelled data, and depend on model performance. This is problematic in healthcare, where XAI must handle multiple data sources and provide individualised explanations. This paper introduces InterSHAP, a cross-modal interaction score that addresses the limitations of existing approaches. InterSHAP uses the Shapley interaction index to precisely separate and quantify the contributions of the individual modalities and their interactions without approximations. By integrating an open-source implementation with the SHAP package, we enhance reproducibility and ease of use. We show that InterSHAP accurately measures the presence of cross-modal interactions, can handle multiple modalities, and provides detailed explanations at a local level for individual samples. Furthermore, we apply InterSHAP to multimodal medical datasets and demonstrate its applicability for individualised explanations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (4 more...)
- Health & Medicine > Health Care Providers & Services (0.68)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI
Nguyen, Elisa, Bertram, Johannes, Kortukov, Evgenii, Song, Jean Y., Oh, Seong Joon
While Explainable AI (XAI) aims to make AI understandable and useful to humans, it has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. We propose an alternative to this bottom-up approach inspired by design thinking: the XAI research community should adopt a top-down, user-focused perspective to ensure user relevance. We illustrate this with a relatively young subfield of XAI, Training Data Attribution (TDA). With the surge in TDA research and growing competition, the field risks repeating the same patterns of solutionism. We conducted a needfinding study with a diverse group of AI practitioners to identify potential user needs related to TDA. Through interviews (N=10) and a systematic survey (N=31), we uncovered new TDA tasks that are currently largely overlooked. We invite the TDA and XAI communities to consider these novel tasks and improve the user relevance of their research outcomes.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.71)
- (2 more...)
MASALA: Model-Agnostic Surrogate Explanations by Locality Adaptation
Anwar, Saif, Griffiths, Nathan, Bhalerao, Abhir, Popham, Thomas
Existing local Explainable AI (XAI) methods, such as LIME, select a region of the input space in the vicinity of a given input instance, for which they approximate the behaviour of a model using a simpler and more interpretable surrogate model. The size of this region is often controlled by a user-defined locality hyperparameter. In this paper, we demonstrate the difficulties associated with defining a suitable locality size to capture impactful model behaviour, as well as the inadequacy of using a single locality size to explain all predictions. We propose a novel method, MASALA, for generating explanations, which automatically determines the appropriate local region of impactful model behaviour for each individual instance being explained. MASALA approximates the local behaviour used by a complex model to make a prediction by fitting a linear surrogate model to a set of points which experience similar model behaviour. These points are found by clustering the input space into regions of linear behavioural trends exhibited by the model. We compare the fidelity and consistency of explanations generated by our method with existing local XAI methods, namely LIME and CHILLI. Experiments on the PHM08 and MIDAS datasets show that our method produces more faithful and consistent explanations than existing methods, without the need to define any sensitive locality hyperparameters.
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)