Goto

Collaborating Authors

 linearity


GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

arXiv.org Machine Learning

The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mathematical framework establishing a representation theory for attributions: every additive, linear, and continuous attribution functional on L^2(Q,mu) admits a unique canonical representation (Q, w, Delta), proved necessary by the Riesz Representation Theorem. This class encompasses SHAP, IG, LIME and linearized GradCAM, but excludes nonlinear functionals such as standard GradCAM or attention maps. Seven formal theorems provide simultaneous guarantees absent in any individual method: (T1) necessary canonical form; (T2) exact completeness; (T3) Monte Carlo convergence O(1/sqrt(m))+O(1/k); (T4) exact Shapley Interaction Values; (T5) Hoeffding ANOVA decomposition; (T6) Sobol sensitivity generalization; (T7) multi-scale extension (MS-GRALIS) with minimum-variance weights. An algebraic appendix justifies the GRALIS-SIV correspondence via the Mobius transform without circularity. GRALIS satisfies 13.5/14 axiomatic properties vs. 2.5-6/14 for individual methods, including completeness, sensitivity, locality, order-k interactions and optimal multi-scale aggregation simultaneously. Preliminary validation on BreaKHis (1,187 histology images, DenseNet-121) reports deletion faithfulness AUC +0.015 (malignant), 96% class-conditional consistency, SAL = 0.762+/-0.109 and sparsity index 0.39. Extended comparison with baseline XAI methods is planned for a companion paper.


Appendix

Neural Information Processing Systems

We extra define the following notations for the proof. In Assumption 3.2, we assume the Lipschitz continuity and smoothness for all the activation functions. In the proof of lemmas, e.g., Lemma B.1 and B.2, we only use the fact that they are Lipschitz continuous and smooth, as well as bounded by a constant 0 > 0 at point 0, hence we use () to denote all the activation functions like what we do in Assumption 3.2 for simplicity. Additionally, in the following we introduce notations of the derivatives, mainly used in the proof of Lemma B.1 and Lemma B.2. By definition of feedforward neural networks in Section 2, different from the standard neural networks such as FCNs and CNNs in which the connection between neurons are generally only in adjacent layers, the neurons in feedforward neural networks can be arbitrarily connected as long as there is no loop.


Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Neural Information Processing Systems

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.




bbc92a647199b832ec90d7cf57074e9e-Supplemental.pdf

Neural Information Processing Systems

Before defining our algorithm at each iterationt we first lighten our notation with a shorthandba(X) = b(ห†p(t 1)(X),a) (at different iterationt, ba denotes different functions), andb(X) is the vector of (b1(X),,bK(X)). For the intuition of the algorithm, consider the t-th iteration where the current prediction function is ห†p(t 1). Thestatement of the theorem is identical; the proof is also essentially the same except for the use of some new technicaltools. Conversely, if ห†p is LB decision calibrated, then kE[p (X) ห†p(X)|U]k1 = 0 almost surely (because if the expectation of a non-negative random variable is zero, the random variable must be zero almost surely), which implies thatห†p is distributioncalibrated. For BKa we use the VC dimension approach.





SHAP-IQ: Unified Approximation of any-order Shapley Interactions

Neural Information Processing Systems

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom.