Explaining Neural Networks without Access to Training Data

Marton, Sascha, Lüdtke, Stefan, Bartelt, Christian, Tschalzev, Andrej, Stuckenschmidt, Heiner

Jun-10-2022–arXiv.org Artificial Intelligence

Artificial neural networks achieve impressive results for various modeling tasks [LeCun et al., 2015, Wang et al., 2020]. However, a downside of their superior performance and sophisticated structure is the comprehensibility of the learned models. In many domains, it is crucial to understand the function learned by a neural network, especially when it comes to decisions that affect people [Samek et al., 2019, Molnar, 2020]. A common approach to tackle the problem of interpretability without sacrificing the superior performance is using a surrogate model as gateway to interpretability [Molnar, 2020]. Most existing global surrogate approaches use a distillation procedure to learn the surrogate model based on the predictions of the neural network [Molnar, 2020, Frosst and Hinton, 2017]. Therefore, they query the neural network based on a representative set of samples and the resulting input-output pairs are then used to train the surrogate model. This representative sample usually comprises the training data of the original model, or at least follows its distribution [Molnar, 2020, Lopes et al., 2017]. However, there are many cases where the training data cannot easily be exposed due to privacy or safety concerns [Lopes et al., 2017, Bhardwaj et al., 2019, Nayak et al., 2019].

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

Jun-10-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Genre:
- Research Report (1.00)

Industry:
- Banking & Finance (0.69)
- Health & Medicine > Therapeutic Area
  - Oncology (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)