A learning theoretic perspective on local explainability
Going from left to right, we consider increasingly complex functions. These neighborhoods, in other words, need to become more and more disjoint as the function becomes more complex. Indeed, we quantify "disjointedness" of the neighborhoods via a term denoted by and relate it to the complexity of the function class, and subsequently, its generalization properties. There has been a growing interest in interpretable machine learning (IML), towards helping users better understand how their ML models behave. IML has become a particularly relevant concern especially as practitioners aim to apply ML in important domains such as healthcare [Caruana et al., '15], financial services [Chen et al., '18], and scientific discovery [Karpatne et al., '17]. While much of the work in IML has been qualitative and empirical, in our recent ICLR21 paper, we study how concepts in interpretability can be formally related to learning theory.
May-18-2021, 13:23:51 GMT