Goto

Collaborating Authors

 Su, Shiye


Explaining Hypergraph Neural Networks: From Local Explanations to Global Concepts

arXiv.org Artificial Intelligence

Hypergraph neural networks are a class of powerful models that leverage the message passing paradigm to learn over hypergraphs, a generalization of graphs well-suited to describing relational data with higher-order interactions. However, such models are not naturally interpretable, and their explainability has received very limited attention. We introduce SHypX, the first model-agnostic post-hoc explainer for hypergraph neural networks that provides both local and global explanations. At the instance-level, it performs input attribution by discretely sampling explanation subhypergraphs optimized to be faithful and concise. At the model-level, it produces global explanation subhypergraphs using unsupervised concept extraction. Extensive experiments across four real-world and four novel, synthetic hypergraph datasets demonstrate that our method finds high-quality explanations which can target a user-specified balance between faithfulness and concision, improving over baselines by 25 percent points in fidelity on average.


Commute-Time-Optimised Graphs for GNNs

arXiv.org Artificial Intelligence

We explore graph rewiring methods that optimise commute time. Recent graph rewiring approaches facilitate long-range interactions in sparse graphs, making such rewirings commute-time-optimal $\textit{on average}$. However, when an expert prior exists on which node pairs should or should not interact, a superior rewiring would favour short commute times between these privileged node pairs. We construct two synthetic datasets with known priors reflecting realistic settings, and use these to motivate two bespoke rewiring methods that incorporate the known prior. We investigate the regimes where our rewiring improves test performance on the synthetic datasets. Finally, we perform a case study on a real-world citation graph to investigate the practical implications of our work.


ALMANACS: A Simulatability Benchmark for Language Model Explainability

arXiv.org Machine Learning

How do we measure the efficacy of language model explainability methods? While many explainability methods have been developed, they are typically evaluated on bespoke tasks, preventing an apples-to-apples comparison. To help fill this gap, we present ALMANACS, a language model explainability benchmark. ALMANACS scores explainability methods on simulatability, i.e., how well the explanations improve behavior prediction on new inputs. The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations. By using another language model to predict behavior based on the explanations, ALMANACS is a fully automated benchmark. We use ALMANACS to evaluate counterfactuals, rationalizations, attention, and Integrated Gradients explanations. Our results are sobering: when averaged across all topics, no explanation method outperforms the explanation-free control. We conclude that despite modest successes in prior work, developing an explanation method that aids simulatability in ALMANACS remains an open challenge. Understanding the behavior of deep neural networks is critical for their safe deployment. While deep neural networks are a black box by default, a wide variety of interpretability methods are being developed to explain their behavior (Räuker et al., 2023; Nauta et al., 2022). Some approaches, such as LIME (Ribeiro et al., 2016) and MUSE (Lakkaraju et al., 2019), try to approximate output behavior. Other approaches try to mechanistically explain the circuits inside a network (Nanda et al., 2023; Wang et al., 2023). Some approaches imitate explanations in the training data (Camburu et al., 2018; Narang et al., 2020; Marasović et al., 2022). Other approaches study the network's activations, such as a transformer's attention over its input (Serrano & Smith, 2019; Wiegreffe & Pinter, 2019). Others aim to create neural networks that are intrinsically explainable (Jain et al., 2020). With so many interpretability methods to choose from, how can we tell which one works best? Despite years of work in the field, there is no consistent evaluation standard. New interpretability papers generally test their methods on bespoke tasks, making it difficult to assess their true effectiveness. To solve this issue, Doshi-Velez & Kim (2017), Nauta et al. (2022), and Räuker et al. (2023) argue that we need standard interpretability benchmarks. Just as benchmarks have driven progress in computer vision (Deng et al., 2009), natural language processing (Wang et al., 2019b;a), and reinforcement learning (Brockman et al., 2016; Tunyasuvunakool et al., 2020), we seek to drive progress in interpretability by enabling apples-to-apples comparisons across diverse methods.