The remarkable advancements in Deep Learning (DL) algorithms have fueled enthusiasm for using Artificial Intelligence (AI) technologies in almost every domain; however, the opaqueness of these algorithms put a question mark on their applications in safety-critical systems. In this regard, the `explainability' dimension is not only essential to both explain the inner workings of black-box algorithms, but it also adds accountability and transparency dimensions that are of prime importance for regulators, consumers, and service providers. eXplainable Artificial Intelligence (XAI) is the set of techniques and methods to convert the so-called black-box AI algorithms to white-box algorithms, where the results achieved by these algorithms and the variables, parameters, and steps taken by the algorithm to reach the obtained results, are transparent and explainable. To complement the existing literature on XAI, in this paper, we take an `engineering' approach to illustrate the concepts of XAI. We discuss the stakeholders in XAI and describe the mathematical contours of XAI from engineering perspective. Then we take the autonomous car as a use-case and discuss the applications of XAI for its different components such as object detection, perception, control, action decision, and so on. This work is an exploratory study to identify new avenues of research in the field of XAI.
The rapidly developing AI systems and applications still require human involvement in practically all parts of the analytics process. Human decisions are largely based on visualizations, providing data scientists details of data properties and the results of analytical procedures. Different visualizations are used in the different steps of the Machine Learning (ML) process. The decision which visualization to use depends on factors, such as the data domain, the data model and the step in the ML process. In this chapter, we describe the seven steps in the ML process and review different visualization techniques that are relevant for the different steps for different types of data, models and purposes.
Artificial Intelligence techniques powered by deep neural nets have achieved much success in several application domains, most significantly and notably in the Computer Vision applications and Natural Language Processing tasks. Surpassing human-level performance propelled the research in the applications where different modalities amongst language, vision, sensory, text play an important role in accurate predictions and identification. Several multimodal fusion methods employing deep learning models are proposed in the literature. Despite their outstanding performance, the complex, opaque and black-box nature of the deep neural nets limits their social acceptance and usability. This has given rise to the quest for model interpretability and explainability, more so in the complex tasks involving multimodal AI methods. This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets, especially for the vision and language tasks. Several topics on multimodal AI and its applications for generic domains have been covered in this paper, including the significance, datasets, fundamental building blocks of the methods and techniques, challenges, applications, and future trends in this domain
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixel-space visualizations to create a novel high-resolution and class-discriminative visualization called Guided Grad-CAM. These methods help better understand CNN-based models, including image captioning and visual question answering (VQA) models. We evaluate our visual explanations by measuring their ability to discriminate between classes, to inspire trust in humans, and their correlation with occlusion maps. Grad-CAM provides a new way to understand CNN-based models. We have released code, an online demo hosted on CloudCV, and a full version of this extended abstract.
Deep learning's great success motivates many practitioners and students to learn about this exciting technology. However, it is often challenging for beginners to take their first step due to the complexity of understanding and applying deep learning. We present CNN Explainer, an interactive visualization tool designed for non-experts to learn and examine convolutional neural networks (CNNs), a foundational deep learning model architecture. Our tool addresses key challenges that novices face while learning about CNNs, which we identify from interviews with instructors and a survey with past students. CNN Explainer tightly integrates a model overview that summarizes a CNN's structure, and on-demand, dynamic visual explanation views that help users understand the underlying components of CNNs. Through smooth transitions across levels of abstraction, our tool enables users to inspect the interplay between low-level mathematical operations and high-level model structures. A qualitative user study shows that CNN Explainer helps users more easily understand the inner workings of CNNs, and is engaging and enjoyable to use. We also derive design lessons from our study. Developed using modern web technologies, CNN Explainer runs locally in users' web browsers without the need for installation or specialized hardware, broadening the public's education access to modern deep learning techniques.