Country
Bounding Singular Values of Convolution Layers
In deep neural networks, the spectral norm of the Jacobian of a layer bounds the factor by which the norm of a signal changes during forward or backward propagation. Spectral norm regularization has also been shown to improve the generalization and robustness of deep networks. However, existing methods to compute the spectral norm of the jacobian of convolution layers either rely on heuristics (but are efficient in computation) or are exact (but computationally expensive to be used during training). In this work, we resolve these issues by deriving an upper bound on the spectral norm of a standard 2D multi-channel convolution layer. Our method provides a provable bound that is differentiable and can be computed efficiently during training with negligible overhead. We show that our spectral bound is an effective regularizer and can be used to bound the lipschitz constant and the curvature (eigenvalues of the Hessian) of neural network. Through experiments on MNIST and CIFAR-10, we demonstrate the effectiveness of our spectral bound in improving the generalization and provable robustness of deep networks against adversarial examples. Our code is available at \url{https://github.com/singlasahil14/CONV-SV}.
Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning
Finlayson, Samuel G., McDermott, Matthew B. A., Pickering, Alex V., Lipnick, Scott L., Yuan, William, Kohane, Isaac S.
Modeling the relationship between chemical structure and molecular activity is a key task in drug development and precision medicine. In this paper, we utilize a novel deep learning architecture to jointly train coordinated embeddings of chemical structures and transcriptional signatures. We do so by training neural networks in a coordinated manner such that learned chemical representations correlate most highly with the encodings of the transcriptional patterns they induce. We then test this approach by using held-out gene expression signatures as queries into embedding space to recover their corresponding compounds. We evaluate these embeddings' utility for small molecule prioritization on this new benchmark task. Our method outperforms a series of baselines, successfully generalizing to unseen transcriptional experiments, but still struggles to generalize to entirely unseen chemical structures.
SWAG: Item Recommendations using Convolutions on Weighted Graphs
Pande, Amit, Ni, Kai, Kini, Venkataramani
SW AG: Item Recommendations using Convolutions on Weighted Graphs Amit Pande, Kai Ni and V enkataramani Kini Data Sciences, Target Corporation Abstract --Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. In this work, we present a Graph Convolutional Network (GCN) algorithm SW AG (Sample Weight and AGgregate), which combines efficient random walks and graph convolutions on weighted graphs to generate embed-dings for nodes (items) that incorporate both graph structure as well as node feature information such as item-descriptions and item-images. The three important SWAG operations that enable us to efficiently generate node embeddings based on graph structures are (a) Sampling of graph to homogeneous structure, (b) W eighting the sampling, walks and convolution operations, and (c) using AGgregation functions for generating convolutions. The work is an adaptation of graphSAGE over weighted graphs. We deploy SW AG at T arget and train it on a graph of more than 500K products sold online with over 50M edges. Offline and online evaluations reveal the benefit of using a graph-based approach and the benefits of weighing to produce high quality embeddings and product recommendations. I NTRODUCTION Convolutional Neural Networks (CNNs) are used to establish state-of-the-art performance on many Computer Vision applications [2]. CNNs consist of a series of parameterized convolutional layers operating locally (around neighboring pixels of an image) to obtain hierarchy of features about an image. The first layer learns simple edge-oriented detectors. Higher layers build up on the learning of lower layers to learn more complex features and objects. The success of CNNs in Computer Vision has inspired efforts to extend the convolu-tional operation from regular grids (2D images), to graph-structured data [9]. Graphs, such as social networks, word co-occurrence networks, guest purchasing behavior, protein-protein interactions and communication networks, occur naturally in various real-world applications. Analyzing them yields insights into the structure of society, language, and different patterns of communication.
SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors
Gong, Zhangxiaowen, Ji, Houxiang, Fletcher, Christopher, Hughes, Christopher, Torrellas, Josep
Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. We propose a scheme to leverage dynamic sparsity during training. In particular, we exploit zeros introduced by the ReLU activation function to both feature maps and their gradients. This is challenging because the sparsity degree is moderate and the locations of zeros change over time. We also rely purely on software. We identify zeros in a dense data representation without transforming the data and performs conventional vectorized computation. Variations of the scheme are applicable to all major components of training: forward propagation, backward propagation by inputs, and backward propagation by weights. Our method significantly outperforms a highly-optimized dense direct convolution on several popular deep neural networks. At realistic sparsity, we speed up the training of the non-initial convolutional layers in VGG16, ResNet-34, ResNet-50, and Fixup ResNet-50 by 2.19x, 1.37x, 1.31x, and 1.51x respectively on an Intel Skylake-X CPU.
Privacy-preserving parametric inference: a case for robust statistics
Differential privacy is a cryptographically-motivated approach to privacy that has become a very active field of research over the last decade in theoretical computer science and machine learning. In this paradigm one assumes there is a trusted curator who holds the data of individuals in a database and the goal of privacy is to simultaneously protect individual data while allowing the release of global characteristics of the database. In this setting we introduce a general framework for parametric inference with differential privacy guarantees. We first obtain differentially private estimators based on bounded influence M-estimators by leveraging their gross-error sensitivity in the calibration of a noise term added to them in order to ensure privacy. We then show how a similar construction can also be applied to construct differentially private test statistics analogous to the Wald, score and likelihood ratio tests. We provide statistical guarantees for all our proposals via an asymptotic analysis. An interesting consequence of our results is to further clarify the connection between differential privacy and robust statistics. In particular, we demonstrate that differential privacy is a weaker stability requirement than infinitesimal robustness, and show that robust M-estimators can be easily randomized in order to guarantee both differential privacy and robustness towards the presence of contaminated data. We illustrate our results both on simulated and real data.
On an Optimal Solution to the Film Scheduling and Showtime Staggering Problem
Kohli, Ikjyot Singh, Inglis, Katherine Goff
In an era of data driven digital transformation, a customer driven business strategy is essential for success. In the motion picture industry, movie exhibitors must compete to win share of consumers entertainment time (and wallet) against digital entertainment alternatives offered by mammoth sized, digital focused, competitors like Netflix, Amazon and Disney [1]. Customer loyalty, point-of-sale and digital payment p latforms produce rich insights that can leveraged to inform business operations and automate the decision-making pr ocess, effectively enabling movie exhibitors to compete using analytics and artificial intelligence. This study presen ts a new, customer driven, quantitative approach to movie scheduling that can be utilized by movie exhibitors to increase attendance and market share. The role of the exhibitor is to show films that are produced by movie st udios (see [2] for more details on the roles of the stakeholders in the movie industry). Exhibitors do not have d ecision making authority over the movies that are produced by the studios.
PointPainting: Sequential Fusion for 3D Object Detection
Vora, Sourabh, Lang, Alex H., Helou, Bassam, Beijbom, Oscar
Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.
Technical report: supervised training of convolutional spiking neural networks with PyTorch
Zimmer, Romain, Pellegrini, Thomas, Singh, Srisht Fateh, Masquelier, Timothรฉe
Recently, it has been shown that spiking neural networks (SNNs) can be trained efficiently, in a supervised manner, using backpropagation through time. Indeed, the most commonly used spiking neuron model, the leaky integrate-and-fire neuron, obeys a differential equation which can be approximated using discrete time steps, leading to a recurrent relation for the potential. The firing threshold causes optimization issues, but they can be overcome using a surrogate gradient. Here, we extend previous approaches in two ways. Firstly, we show that the approach can be used to train convolutional layers. Convolutions can be done in space, time (which simulates conduction delays), or both. Secondly, we include fast horizontal connections \`a la Den\`eve: when a neuron N fires, we subtract to the potentials of all the neurons with the same receptive the dot product between their weight vectors and the one of neuron N. As Den\`eve et al. showed, this is useful to represent a dynamic multidimensional analog signal in a population of spiking neurons. Here we demonstrate that, in addition, such connections also allow implementing a multidimensional send-on-delta coding scheme. We validate our approach on one speech classification benchmarks: the Google speech command dataset. We managed to reach nearly state-of-the-art accuracy (94%) while maintaining low firing rates (about 5Hz). Our code is based on PyTorch and is available in open source at http://github.com/romainzimmer/s2net
DL-Droid: Deep learning based android malware detection using real devices
Alzaylaee, Mohammed K., Yerima, Suleiman Y., Sezer, Sakir
The Android operating system has been the most popular for smartphones and tablets since 2012. This popularity has led to a rapid raise of Android malware in recent years. The sophistication of Android malware obfuscation and detection avoidance methods have significantly improved, making many traditional malware detection methods obsolete. In this paper, we propose DL-Droid, a deep learning system to detect malicious Android applications through dynamic analysis using stateful input generation. Experiments performed with over 30,000 applications (benign and malware) on real devices are presented. Furthermore, experiments were also conducted to compare the detection performance and code coverage of the stateful input generation method with the commonly used stateless approach using the deep learning system. Our study reveals that DL-Droid can achieve up to 97.8% detection rate (with dynamic features only) and 99.6% detection rate (with dynamic + static features) respectively which outperforms traditional machine learning techniques. Furthermore, the results highlight the significance of enhanced input generation for dynamic analysis as DL-Droid with the state-based input generation is shown to outperform the existing state-of-the-art approaches.
Optimizing Data Usage via Differentiable Rewards
Wang, Xinyi, Pham, Hieu, Michel, Paul, Anastasopoulos, Antonios, Neubig, Graham, Carbonell, Jaime
To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that "adapts" to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.