A Learning Theoretic Perspective on Local Explainability
Li, Jeffrey, Nagarajan, Vaishnavh, Plumb, Gregory, Talwalkar, Ameet
In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the testtime accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an important concern for a growing class of finite sample-based local approximation explanations. Finally, we validate our theoretical results empirically and show that they reflect what can be seen in practice. There has been a growing interest in interpretable machine learning, which seeks to help people understand their models. While interpretable machine learning encompasses a wide range of problems, it is a fairly uncontroversial hypothesis that there exists a tradeoff between a model's complexity and general notions of interpretability. This hypothesis suggests a seemingly natural connection to the field of learning theory, which has thoroughly explored relationships between a function class's complexity and generalization. However, formal connections between interpretability and learning theory remain relatively unstudied.
Nov-2-2020