Bouaynaya, Nidhal
Dynamic Continual Learning: Harnessing Parameter Uncertainty for Improved Network Adaptation
Angelini, Christopher, Bouaynaya, Nidhal
When fine-tuning Deep Neural Networks (DNNs) to new data, DNNs are prone to overwriting network parameters required for task-specific functionality on previously learned tasks, resulting in a loss of performance on those tasks. We propose using parameter-based uncertainty to determine which parameters are relevant to a network's learned function and regularize training to prevent change in these important parameters. We approach this regularization in two ways: (1), we constrain critical parameters from significant changes by associating more critical parameters with lower learning rates, thereby limiting alterations in those parameters; (2), important parameters are restricted from change by imposing a higher regularization weighting, causing parameters to revert to their states prior to the learning of subsequent tasks. We leverage a Bayesian Moment Propagation framework which learns network parameters concurrently with their associated uncertainties while allowing each parameter to contribute uncertainty to the network's predictive distribution, avoiding the pitfalls of existing sampling-based methods. The proposed approach is evaluated for common sequential benchmark datasets and compared to existing published approaches from the Continual Learning community. Ultimately, we show improved Continual Learning performance for Average Test Accuracy and Backward Transfer metrics compared to sampling-based methods and other non-uncertainty-based approaches.
Variational Density Propagation Continual Learning
Angelini, Christopher, Bouaynaya, Nidhal, Rasool, Ghulam
Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data, various types of noise, and shifting conceptual objectives. This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets. We develop and evaluate a method of Continual Learning that leverages uncertainty quantification from Bayesian Inference to mitigate catastrophic forgetting. We expand on previous approaches by removing the need for Monte Carlo sampling of the model weights to sample the predictive distribution. We optimize a closed-form Evidence Lower Bound (ELBO) objective approximating the predictive distribution by propagating the first two moments of a distribution, i.e. mean and covariance, through all network layers. Catastrophic forgetting is mitigated by using the closed-form ELBO to approximate the Minimum Description Length (MDL) Principle, inherently penalizing changes in the model likelihood by minimizing the KL Divergence between the variational posterior for the current task and the previous task's variational posterior acting as the prior. Leveraging the approximation of the MDL principle, we aim to initially learn a sparse variational posterior and then minimize additional model complexity learned for subsequent tasks. Our approach is evaluated for the task incremental learning scenario using density propagated versions of fully-connected and convolutional neural networks across multiple sequential benchmark datasets with varying task sequence lengths. Ultimately, this procedure produces a minimally complex network over a series of tasks mitigating catastrophic forgetting.
EvalAttAI: A Holistic Approach to Evaluating Attribution Maps in Robust and Non-Robust Models
Nielsen, Ian E., Ramachandran, Ravi P., Bouaynaya, Nidhal, Fathallah-Shaykh, Hassan M., Rasool, Ghulam
The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adversarial attacks is also being actively explored. This paper focuses on evaluating methods of attribution mapping to find whether robust neural networks are more explainable. We explore this problem within the application of classification for medical imaging. Explainability research is at an impasse. There are many methods of attribution mapping, but no current consensus on how to evaluate them and determine the ones that are the best. Our experiments on multiple datasets (natural and medical imaging) and various attribution methods reveal that two popular evaluation metrics, Deletion and Insertion, have inherent limitations and yield contradictory results. We propose a new explainability faithfulness metric (called EvalAttAI) that addresses the limitations of prior metrics. Using our novel evaluation, we found that Bayesian deep neural networks using the Variational Density Propagation technique were consistently more explainable when used with the best performing attribution method, the Vanilla Gradient. However, in general, various types of robust neural networks may not be more explainable, despite these models producing more visually plausible attribution maps.
Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks
Nielsen, Ian E., Dera, Dimah, Rasool, Ghulam, Bouaynaya, Nidhal, Ramachandran, Ravi P.
With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainability until very recently. In this tutorial paper, we start by presenting gradient-based interpretability methods. These techniques use gradient signals to assign the burden of the decision on the input features. Later, we discuss how gradient-based methods can be evaluated for their robustness and the role that adversarial robustness plays in having meaningful explanations. We also discuss the limitations of gradient-based methods. Finally, we present the best practices and attributes that should be examined before choosing an explainability method. We conclude with the future directions for research in the area at the convergence of robustness and explainability.