Country
Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter Maps
Krug, Andreas, Stober, Sebastian
The uninformative ordering of artificial neurons in Deep Neural Networks complicates visualizing activations in deeper layers. This is one reason why the internal structure of such models is very unintuitive. In neuroscience, activity of real brains can be visualized by highlighting active regions. Inspired by those techniques, we train a convolutional speech recognition model, where filters are arranged in a 2D grid and neighboring filters are similar to each other. We show, how those topographic filter maps visualize artificial neuron activations more intuitively. Moreover, we investigate, whether this causes phoneme-responsive neurons to be grouped in certain regions of the topographic map.
Does Interpretability of Neural Networks Imply Adversarial Robustness?
Noack, Adam, Ahern, Isaac, Dou, Dejing, Li, Boyang
The success of deep neural networks is clouded by two issues that largely remain open to this day: the abundance of adversarial attacks that fool neural networks with small perturbations and the lack of interpretation for the predictions they make. Empirical evidence in the literature as well as theoretical analysis on simple models suggest these two seemingly disparate issues may actually be connected, as robust models tend to be more interpretable than non-robust models. In this paper, we provide evidence for the claim that this relationship is bidirectional. Viz., models that are forced to have interpretable gradients are more robust to adversarial examples than models trained in a standard manner . With further analysis and experiments, we identify two factors behind this phenomenon, namely the suppression of the gradient and the selective use of features guided by high-quality interpretations, which explain model behaviors under various regularization and target interpretation settings.
Principal Component Properties of Adversarial Samples
Jere, Malhar, Herbig, Sandro, Lind, Christine, Koushanfar, Farinaz
Deep Neural Networks for image classification have been found to be vulnerable to adversarial samples, which consist of sub-perceptual noise added to a benign image that can easily fool trained neural networks, posing a significant risk to their commercial deployment. In this work, we analyze adversarial samples through the lens of their contributions to the principal components of each image, which is different than prior works in which authors performed PCA on the entire dataset. We investigate a number of state-of-the-art deep neural networks trained on ImageNet as well as several attacks for each of the networks. Our results demonstrate empirically that adversarial samples across several attacks have similar properties in their contributions to the principal components of neural network inputs. We propose a new metric for neural networks to measure their robustness to adversarial samples, termed the (k,p) point. We utilize this metric to achieve 93.36% accuracy in detecting adversarial samples independent of architecture and attack type for models trained on ImageNet.
Robust Deep Graph Based Learning for Binary Classification
Ye, Minxiang, Stankovic, Vladimir, Stankovic, Lina, Cheung, Gene
Convolutional neural network (CNN)-based feature learning has become state of the art, since given sufficient training data, CNN can significantly outperform traditional methods for various classification tasks. However, feature learning becomes more difficult if some training labels are noisy. With traditional regularization techniques, CNN often overfits to the noisy training labels, resulting in sub-par classification performance. In this paper, we propose a robust binary classifier, based on CNNs, to learn deep metric functions, which are then used to construct an optimal underlying graph structure used to clean noisy labels via graph Laplacian regularization (GLR). GLR is posed as a convex maximum a posteriori (MAP) problem solved via convex quadratic programming (QP). To penalize samples around the decision boundary, we propose two regularized loss functions for semi-supervised learning. The binary classification experiments on three datasets, varying in number and type of features, demonstrate that given a noisy training dataset, our proposed networks outperform several state-of-the-art classifiers, including label-noise robust support vector machine, CNNs with three different robust loss functions, model-based GLR, and dynamic graph CNN classifiers.
Recent advances in deep learning applied to skin cancer detection
Pacheco, Andre G. C., Krohling, Renato A.
Skin cancer is a major public health problem around the world. Its early detection is very important to increase patient prognostics. However, the lack of qualified professionals and medical instruments are significant issues in this field. In this context, over the past few years, deep learning models applied to automated skin cancer detection have become a trend. In this paper, we present an overview of the recent advances reported in this field as well as a discussion about the challenges and opportunities for improvement in the current models. In addition, we also present some important aspects regarding the use of these models in smartphones and indicate future directions we believe the field will take.
Differentially Private Mixed-Type Data Generation For Unsupervised Learning
Tantipongpipat, Uthaipon, Waites, Chris, Boob, Digvijay, Siva, Amaresh Ankit, Cummings, Rachel
In this work we introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data, and privately train a model for generating synthetic data that will satisfy the same statistical properties as the original data. This learned model can be used to generate arbitrary amounts of publicly available synthetic data, which can then be freely shared due to the post-processing guarantees of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both unlabeled binary data (MIMIC-III) and unlabeled mixed-type data (ADULT). We also introduce new metrics for evaluating the quality of synthetic mixed-type data, particularly in unsupervised settings.
Gaussian Process Priors for View-Aware Inference
Hou, Yuxin, Heljakka, Ari, Solin, Arno
We derive a principled framework for encoding prior knowledge of information coupling between views or camera poses (translation and orientation) of a single scene. While deep neural networks have become the prominent solution to many tasks in computer vision, some important problems not so well suited for deep models have received less attention. These include uncertainty quantification, auxiliary data fusion, and real-time processing, which are instrumental for delivering practical methods with robust inference. While these are central goals in probabilistic machine learning, there is a tangible gap between the theory and practice of applying probabilistic methods to many modern vision problems. For this, we derive a novel parametric kernel (covariance function) in the pose space, $\mathrm{SE}(3)$, that encodes information about input pose relationships into larger models. We show how this soft-prior knowledge can be applied to improve performance on several real vision tasks, such as feature tracking, human face encoding, and view synthesis.
VALAN: Vision and Language Agent Navigation
Lansing, Larry, Jain, Vihan, Mehta, Harsh, Huang, Haoshuo, Ie, Eugene
VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as Vision-and-Language Navigation and Vision-and-Dialog Navigation, in photo-realistic environments, such as Matterport3D and Google StreetView. We have added a minimal set of abstractions on top of SEED RL allowing us to generalize the architecture to solve a variety of other RL problems. In this article, we will describe VALAN's software abstraction and architecture, and also present an example of using VALAN to design agents for instruction-conditioned indoor navigation.
What Do You Mean I'm Funny? Personalizing the Joke Skill of a Voice-Controlled Virtual Assistant
Mottini, Alejandro, Chowdhury, Amber Roy
A considerable part of the success experienced by Voice-controlled virtual assistants (VVA) is due to the emotional and personalized experience they deliver, with humor being a key component in providing an engaging interaction. In this paper we describe methods used to improve the joke skill of a VVA through personalization. The first method, based on traditional NLP techniques, is robust and scalable. The others combine self-attentional network and multi-task learning to obtain better results, at the cost of added complexity. A significant challenge facing these systems is the lack of explicit user feedback needed to provide labels for the models. Instead, we explore the use of two implicit feedback-based labelling strategies. All models were evaluated on real production data. Online results show that models trained on any of the considered labels outperform a heuristic method, presenting a positive real-world impact on user satisfaction. Offline results suggest that the deep-learning approaches can improve the joke experience with respect to the other considered methods.
Tree bark re-identification using a deep-learning feature descriptor
Robert, Martin, Dallaire, Patrick, Giguère, Philippe
The ability to visually re-identify objects is a fundamental capability in vision systems. Oftentimes, it relies on collections of visual signatures based on descriptors, such as Scale Invariant Feature Transform (SIFT) or Speeded Up Robust Features (SURF). However, these traditional descriptors were designed for a certain domain of surface appearances and geometries (limited relief). Consequently, highly-textured surfaces such as tree bark pose a challenge to them. In turns, this makes it more difficult to use trees as identifiable landmarks for navigational purposes (robotics) or to track felled lumber along a supply chain (logistics). We thus propose to use data-driven descriptors trained on bark images for tree surface re-identification. To this effect, we collected a large dataset containing 2,400 bark images with strong illumination changes, annotated by surface and with the ability to pixel-align them. We used this dataset to sample from more than 2 million 64x64 pixel patches to train our novel local descriptors DeepBark and SqueezeBark. Our DeepBark method has shown a clear advantage against the hand-crafted descriptors SIFT and SURF. Furthermore, we demonstrated that DeepBark can reach a Precision@1 of 99.8% in a database of 7,900 images with only 11 relevant images. Our work thus suggests that re-identifying tree surfaces in a challenging context is possible, while making public a new dataset.