Banff
Resisting Adversarial Attacks in Deep Neural Networks using Diverse Decision Boundaries
Alam, Manaar, Datta, Shubhajit, Mukhopadhyay, Debdeep, Mondal, Arijit, Chakrabarti, Partha Pratim
The security of deep learning (DL) systems is an extremely important field of study as they are being deployed in several applications due to their ever-improving performance to solve challenging tasks. Despite overwhelming promises, the deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify. Protections against adversarial perturbations on ensemble-based techniques have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. In this paper, we attempt to develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model. The ensemble of classifiers constructed by (1) transformation of the input by a method called Split-and-Shuffle, and (2) restricting the significant features by a method called Contrast-Significant-Features are shown to result in diverse gradients with respect to adversarial attacks, which reduces the chance of transferring adversarial examples from the original to the defender model targeting the same class. We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble-based defense. We also evaluate the robustness in the presence of a stronger adversary targeting all the models within the ensemble simultaneously. Results for the overall false positives and false negatives have been furnished to estimate the overall performance of the proposed methodology.
Out-of-distribution Detection via Frequency-regularized Generative Models
Modern deep generative models can assign high likelihood to inputs drawn from outside the training distribution, posing threats to models in open-world deployments. While much research attention has been placed on defining new test-time measures of OOD uncertainty, these methods do not fundamentally change how deep generative models are regularized and optimized in training. In particular, generative models are shown to overly rely on the background information to estimate the likelihood. To address the issue, we propose a novel frequency-regularized learning FRL framework for OOD detection, which incorporates high-frequency information into training and guides the model to focus on semantically relevant features. FRL effectively improves performance on a wide range of generative architectures, including variational auto-encoder, GLOW, and PixelCNN++. On a new large-scale evaluation task, FRL achieves the state-of-the-art performance, outperforming a strong baseline Likelihood Regret by 10.7% (AUROC) while achieving 147$\times$ faster inference speed. Extensive ablations show that FRL improves the OOD detection performance while preserving the image generation quality. Code is available at https://github.com/mu-cai/FRL.
The SVD of Convolutional Weights: A CNN Interpretability Framework
Praggastis, Brenda, Brown, Davis, Marrero, Carlos Ortiz, Purvine, Emilie, Shapiro, Madelyn, Wang, Bei
Deep neural networks used for image classification often use convolutional filters to extract distinguishing features before passing them to a linear classifier. Most interpretability literature focuses on providing semantic meaning to convolutional filters to explain a model's reasoning process and confirm its use of relevant information from the input domain. Fully connected layers can be studied by decomposing their weight matrices using a singular value decomposition, in effect studying the correlations between the rows in each matrix to discover the dynamics of the map. In this work we define a singular value decomposition for the weight tensor of a convolutional layer, which provides an analogous understanding of the correlations between filters, exposing the dynamics of the convolutional map. We validate our definition using recent results in random matrix theory. By applying the decomposition across the linear layers of an image classification network we suggest a framework against which interpretability methods might be applied using hypergraphs to model class separation. Rather than looking to the activations to explain the network, we use the singular vectors with the greatest corresponding singular values for each linear layer to identify those features most important to the network. We illustrate our approach with examples and introduce the DeepDataProfiler library, the analysis tool used for this study.
An Analytic Framework for Robust Training of Artificial Neural Networks
Barati, Ramin, Safabakhsh, Reza, Rahmati, Mohammad
The reliability of a learning model is key to the successful deployment of machine learning in various industries. Creating a robust model, particularly one unaffected by adversarial attacks, requires a comprehensive understanding of the adversarial examples phenomenon. However, it is difficult to describe the phenomenon due to the complicated nature of the problems in machine learning. Consequently, many studies investigate the phenomenon by proposing a simplified model of how adversarial examples occur and validate it by predicting some aspect of the phenomenon. While these studies cover many different characteristics of the adversarial examples, they have not reached a holistic approach to the geometric and analytic modeling of the phenomenon. This paper propose a formal framework to study the phenomenon in learning theory and make use of complex analysis and holomorphicity to offer a robust learning rule for artificial neural networks. With the help of complex analysis, we can effortlessly move between geometric and analytic perspectives of the phenomenon and offer further insights on the phenomenon by revealing its connection with harmonic functions. Using our model, we can explain some of the most intriguing characteristics of adversarial examples, including transferability of adversarial examples, and pave the way for novel approaches to mitigate the effects of the phenomenon.
A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck
Attention-based deep learning models, such as Transformers (Vaswani et al., 2017; Devlin et al., 2019), have achieved unprecedented empirical success in a wide range of cognitive tasks, in particular in natural language processing (NLP). On the other hand, deep variational Bayesian approaches to representation learning, such as variational autoencoders (VAEs) (Kingma and Welling, 2014), have also been very influential, especially due to their variational information bottleneck (VIB) (Alemi et al., 2017; Kingma and Welling, 2014) for regularising the induced latent representations. Previous VIB methods only apply to a vector space, and Transformers crucially do not use a single vector as their latent representation, instead using a set of vectors (Lin et al., 2020; Fang et al., 2021; Park and Lee, 2021). This allows the number of vectors in a Transformer embedding to grow with the size of the input, which is essential for embedding natural language text (Bahdanau et al., 2015), where the size of the input can range from a single word to thousands of words. In this paper, we propose a variational information bottleneck regulariser for set-of-vector latent representations, and use it to regularise the induced latent representation of a Transformer encoder-decoder variational autoencoder.
What's on your mind? A Mental and Perceptual Load Estimation Framework towards Adaptive In-vehicle Interaction while Driving
Gomaa, Amr, Alles, Alexandra, Meiser, Elena, Rupp, Lydia Helene, Molz, Marco, Reyes, Guillermo
Several researchers have focused on studying driver cognitive behavior and mental load for in-vehicle interaction while driving. Adaptive interfaces that vary with mental and perceptual load levels could help in reducing accidents and enhancing the driver experience. In this paper, we analyze the effects of mental workload and perceptual load on psychophysiological dimensions and provide a machine learning-based framework for mental and perceptual load estimation in a dual task scenario for in-vehicle interaction (https://github.com/amrgomaaelhady/MWL-PL-estimator). We use off-the-shelf non-intrusive sensors that can be easily integrated into the vehicle's system. Our statistical analysis shows that while mental workload influences some psychophysiological dimensions, perceptual load shows little effect. Furthermore, we classify the mental and perceptual load levels through the fusion of these measurements, moving towards a real-time adaptive in-vehicle interface that is personalized to user behavior and driving conditions. We report up to 89% mental workload classification accuracy and provide a real-time minimally-intrusive solution.
Hierarchical Interpretation of Neural Text Classification
Yan, Hanqi, Gui, Lin, He, Yulan
Recent years have witnessed increasing interests in developing interpretable models in Natural Language Processing (NLP). Most existing models aim at identifying input features such as words or phrases important for model predictions. Neural models developed in NLP however often compose word semantics in a hierarchical manner and text classification requires hierarchical modelling to aggregate local information in order to deal with topic and label shifts more effectively. As such, interpretation by words or phrases only cannot faithfully explain model decisions in text classification. This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions in the form of label-associated topics in a hierarchical manner. Model interpretation is no longer at the word level, but built on topics as the basic semantic unit. Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers, and generates interpretations more faithful to model predictions and better understood by humans than other interpretable neural text classifiers.
Integrating connection search in graph queries
Anadiotis, Angelos Christos, Manolescu, Ioana, Mohanty, Madhulika
Graph data management and querying has many practical applications. When graphs are very heterogeneous and/or users are unfamiliar with their structure, they may need to find how two or more groups of nodes are connected in a graph, even when users are not able to describe the connections. This is only partially supported by existing query languages, which allow searching for paths, but not for trees connecting three or more node groups. The latter is related to the NP-hard Group Steiner Tree problem, and has been previously considered for keyword search in databases. In this work, we formally show how to integrate connecting tree patterns (CTPs, in short) within a graph query language such as SPARQL or Cypher, leading to an Extended Query Language (or EQL, in short). We then study a set of algorithms for evaluating CTPs; we generalize prior keyword search work, most importantly by (i) considering bidirectional edge traversal and (ii) allowing users to select any score function for ranking CTP results. To cope with very large search spaces, we propose an efficient pruning technique and formally establish a large set of cases where our algorithm, MOLESP, is complete even with pruning. Our experiments validate the performance of our CTP and EQL evaluation algorithms on a large set of synthetic and real-world workloads.
Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction
Cao, Yue, Zhou, XiaoJiang, Feng, Jiaqi, Huang, Peihao, Xiao, Yao, Chen, Dayao, Chen, Sheng
Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR) prediction applications, especially in industrial recommender, search, or advertising systems. However, it's non-trivial for real-world systems to make full use of long-term user behaviors due to the strict requirements of online serving time. Most previous works adopt the retrieval-based strategy, where a small number of user behaviors are retrieved first for subsequent attention. However, the retrieval-based methods are sub-optimal and would cause more or less information losses, and it's difficult to balance the effectiveness and efficiency of the retrieval algorithm. In this paper, we propose SDIM (Sampling-based Deep Interest Modeling), a simple yet effective sampling-based end-to-end approach for modeling long-term user behaviors. We sample from multiple hash functions to generate hash signatures of the candidate item and each item in the user behavior sequence, and obtain the user interest by directly gathering behavior items associated with the candidate item with the same hash signature. We show theoretically and experimentally that the proposed method performs on par with standard attention-based models on modeling long-term user behaviors, while being sizable times faster. We also introduce the deployment of SDIM in our system. Specifically, we decouple the behavior sequence hashing, which is the most time-consuming part, from the CTR model by designing a separate module named BSE (behavior Sequence Encoding). BSE is latency-free for the CTR server, enabling us to model extremely long user behaviors. Both offline and online experiments are conducted to demonstrate the effectiveness of SDIM. SDIM now has been deployed online in the search system of Meituan APP.
Graph neural networks for materials science and chemistry
Reiser, Patrick, Neubert, Marlen, Eberhard, André, Torresi, Luca, Zhou, Chen, Shao, Chen, Metni, Houssam, van Hoesel, Clint, Schopmans, Henrik, Sommer, Timo, Friederich, Pascal
Machine learning plays an increasingly important role in many areas of chemistry and materials science, e.g. to predict materials properties, to accelerate simulations, to design new materials, and to predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this review article, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.