Weng, Tsui-Wei
Label-Free Concept Bottleneck Models
Oikarinen, Tuomas, Das, Subhro, Nguyen, Lam M., Weng, Tsui-Wei
Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to humanunderstandable concepts. However, existing CBMs and their variants have two crucial limitations: first, they need to collect labeled data for each of the predefined concepts, which is time consuming and labor intensive; second, the accuracy of a CBM is often significantly lower than that of a standard neural network, especially on more complex datasets. This poor performance creates a barrier for adopting CBMs in practical real world applications. Motivated by these challenges, we propose Label-free CBM which is a novel framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. Our Label-free CBM has many advantages, it is: scalable - we present the first CBM scaled to ImageNet, efficient - creating a CBM takes only a few hours even for very large datasets, and automated - training it for a new dataset requires minimal human effort. Finally, in Appendix B we conduct a large scale user evaluation of the interpretability of our method. Deep neural networks (DNNs) have demonstrated unprecedented success in a wide range of machine learning tasks such as computer vision, natural language processing, and speech recognition. However, due to their complex and deep structures, they are often regarded as black-box models that are difficult to understand and interpret. Interpretable models are important for many reasons such as creating calibrated trust in models, which means understanding when we should trust the models. Making deep learning models more interpretable is an active yet challenging research topic. Figure 1: Our proposed Label-free CBM has many desired features which existing CBMs lack, and it can transform any neural network backbone into an interpretable Concept Bottleneck Model. One approach to make deep learning more interpretable is through Concept Bottleneck Models (CBMs) (Koh et al., 2020). CBMs typically have a Concept Bottleneck Layer before the (last) fully connected layer of the neural network. The concept bottleneck layer is trained to have each neuron correspond to a single human understandable concept. This makes the final decision a linear function of interpretable concepts, greatly increasing our understanding of the decision making.
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
Oikarinen, Tuomas, Weng, Tsui-Wei
In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples. We show that CLIP-Dissect provides more accurate descriptions than existing methods for last layer neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and can label all neurons from five layers of ResNet-50 in just 4 minutes, which is more than 10 times faster than existing methods. Our code is available at https://github.com/Trustworthy-ML-Lab/CLIP-dissect. Finally, crowdsourced user study results are available at Appendix B to further support the effectiveness of our method.
Concept-Monitor: Understanding DNN training through individual neurons
Khan, Mohammad Ali, Oikarinen, Tuomas, Weng, Tsui-Wei
In this work, we propose a general framework called Concept-Monitor to help demystify the black-box DNN training processes automatically using a novel unified embedding space and concept diversity metric. Concept-Monitor enables human-interpretable visualization and indicators of the DNN training processes and facilitates transparency as well as deeper understanding on how DNNs develop along the during training. Inspired by these findings, we also propose a new training regularizer that incentivizes hidden neurons to learn diverse concepts, which we show to improve training performance. Finally, we apply Concept-Monitor to conduct several case studies on different training paradigms including adversarial training, fine-tuning and network pruning via the Lottery Ticket Hypothesis
Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies
Han, Ligong, Han, Seungwook, Sudalairaj, Shivchander, Loh, Charlotte, Dangovski, Rumen, Deng, Fei, Agrawal, Pulkit, Metaxas, Dimitris, Karlinsky, Leonid, Weng, Tsui-Wei, Srivastava, Akash
Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-generation methods has been able to outperform expert transformations. In this work, we tackle a different question: instead of replacing expert transformations with generated views, can we constructively assimilate generated views with expert transformations? We answer this question in the affirmative and propose a view generation method and a simple, effective assimilation method that together improve the state-of-the-art by up to ~3.6% on three different datasets. Importantly, we conduct a detailed empirical study that systematically analyzes a range of view generation and assimilation methods and provides a holistic picture of the efficacy of learned views in contrastive representation learning.
Certified Interpretability Robustness for Class Activation Mapping
Gu, Alex, Weng, Tsui-Wei, Chen, Pin-Yu, Liu, Sijia, Daniel, Luca
Interpreting machine learning models is challenging but crucial for ensuring the safety of deep networks in autonomous driving systems. Due to the prevalence of deep learning based perception models in autonomous vehicles, accurately interpreting their predictions is crucial. While a variety of such methods have been proposed, most are shown to lack robustness. Yet, little has been done to provide certificates for interpretability robustness. Taking a step in this direction, we present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of the top k pixels of its CAM interpretability map. We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation not far from (4-5x) state-of-the-art attack methods.
Quantifying Safety of Learning-based Self-Driving Control Using Almost-Barrier Functions
Qin, Zhizhen, Weng, Tsui-Wei, Gao, Sicun
We will describe the sampling-based learning procedures Safe path-tracking control is crucial for reliable autonomous for constructing candidate neural barrier functions, and driving. Widely-adopted control methods [1], [2] have inherent certification procedures that utilize robustness analysis for difficulty with nonlinearity and uncertainty that cannot neural networks to certify regions where the barrier conditions be ignored when the vehicles are operating at relatively high are fully satisfied. Our approach is built on recent advances in speed or under adverse road conditions. Controllers obtained robustness certification of neural networks [10], [11], which from deep learning methods have shown great promise in a allows us to rigorously bound the Lie derivative values of the variety of application [3], [4]. However, neural networks are learned neural barrier function. With these methods, we are known to be highly nonlinear and complex, preventing them able to train and certify barrier functions with small region from being easily analyzed as classical controllers such as of boundary violations: 99% of the barrier region is fully Stanley [5] or Model Predictive Control (MPC) [2], [6]. In this certified for the kinematic vehicle model, 86% for the highly paper, we propose methods for the quantitative safety analysis nonlinear dynamic model with inertial effects and lateral of learning-based neural controllers by synthesizing and slip, and 91% in the TORCS environment with high-fidelity certifying neural almost-barrier functions, for path-tracking vehicle dynamics simulation (Fig.1). We visualize the certified with only black-box access to high-fidelity simulations of regions (in blue contour) and the sparse uncertified regions (in nonlinear vehicle dynamics.
Evaluating Robustness of Cooperative MARL: A Model-based Approach
Pham, Nhan H., Nguyen, Lam M., Chen, Jie, Lam, Hoang Thanh, Das, Subhro, Weng, Tsui-Wei
In recent years, a proliferation of methods were developed for cooperative multi-agent reinforcement learning (c-MARL). However, the robustness of c-MARL agents against adversarial attacks has been rarely explored. In this paper, we propose to evaluate the robustness of c-MARL agents via a model-based approach. Our proposed formulation can craft stronger adversarial state perturbations of c-MARL agents(s) to lower total team rewards more than existing model-free approaches. In addition, we propose the first victim-agent selection strategy which allows us to develop even stronger adversarial attack. Numerical experiments on multi-agent MuJoCo benchmarks illustrate the advantage of our approach over other baselines. The proposed model-based attack consistently outperforms other baselines in all tested environments.
On the Equivalence between Neural Network and Support Vector Machine
Chen, Yilan, Huang, Wei, Nguyen, Lam M., Weng, Tsui-Wei
Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalence between NN and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments are available at \url{https://github.com/leslie-CH/equiv-nn-svm}.
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
Wang, Ren, Xu, Kaidi, Liu, Sijia, Chen, Pin-Yu, Weng, Tsui-Wei, Gan, Chuang, Wang, Meng
Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning. It enables us to learn a meta-initialization} of model parameters (that we call meta-model) to rapidly adapt to new tasks using a small amount of labeled training data. Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning. In addition to generalization, robustness is also desired for a meta-model to defend adversarial examples (attacks). Toward promoting adversarial robustness in MAML, we first study WHEN a robustness-promoting regularization should be incorporated, given the fact that MAML adopts a bi-level (fine-tuning vs. meta-update) learning procedure. We show that robustifying the meta-update stage is sufficient to make robustness adapted to the task-specific fine-tuning stage even if the latter uses a standard training protocol. We also make additional justification on the acquired robustness adaptation by peering into the interpretability of neurons' activation maps. Furthermore, we investigate HOW robust regularization can efficiently be designed in MAML. We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning. In particular, we for the first time show that the auxiliary contrastive learning task can enhance the adversarial robustness of MAML. Finally, extensive experiments are conducted to demonstrate the effectiveness of our proposed methods in robust few-shot learning.
Fast Training of Provably Robust Neural Networks by SingleProp
Boopathy, Akhilan, Weng, Tsui-Wei, Liu, Sijia, Chen, Pin-Yu, Zhang, Gaoyuan, Daniel, Luca
Recent works have developed several methods of defending neural networks against adversarial attacks with certified guarantees. However, these techniques can be computationally costly due to the use of certification during training. We develop a new regularizer that is both more efficient than existing certified defenses, requiring only one additional forward propagation through a network, and can be used to train networks with similar certified accuracy. Through experiments on MNIST and CIFAR-10 we demonstrate improvements in training speed and comparable certified accuracy compared to state-of-the-art certified defenses.