Country
PoET-BiN: Power Efficient Tiny Binary Neurons
Chidambaram, Sivakumar, Langlois, J. M. Pierre, David, Jean Pierre
The success of neural networks in image classification has inspired various hardware implementations on embedded platforms such as Field Programmable Gate Arrays, embedded processors and Graphical Processing Units. These embedded platforms are constrained in terms of power, which is mainly consumed by the Multiply Accumulate operations and the memory accesses for weight fetching. Quantization and pruning have been proposed to address this issue. Though effective, these techniques do not take into account the underlying architecture of the embedded hardware. In this work, we propose PoET-BiN, a Look-Up Table based power efficient implementation on resource constrained embedded devices. A modified Decision Tree approach forms the backbone of the proposed implementation in the binary domain. A LUT access consumes far less power than the equivalent Multiply Accumulate operation it replaces, and the modified Decision Tree algorithm eliminates the need for memory accesses. We applied the PoET-BiN architecture to implement the classification layers of networks trained on MNIST, SVHN and CIFAR-10 datasets, with near state-of-the art results. The energy reduction for the classifier portion reaches up to six orders of magnitude compared to a floating point implementations and up to three orders of magnitude when compared to recent binary quantized neural networks.
Stochasticity in Neural ODEs: An Empirical Study
Oganesyan, Viktor, Volokhova, Alexandra, Vetrov, Dmitry
Despite its success, continuous-time models, such as neural ordinary differential equation (ODE), usually rely on a completely deterministic feed-forward operation. This work provides an empirical study of stochastically regularized neural ODE on several image-classification tasks (CIFAR-10, CIFAR-100, TinyImageNet). Building upon the formalism of stochastic differential equations (SDEs), we demonstrate that neural SDE is able to outperform its deterministic counterpart. Further, we show that data augmentation during the training improves the performance of both deterministic and stochastic versions of the same model. However, the improvements obtained by the data augmentation completely eliminate the empirical gains of the stochastic regularization, making the difference in the performance of neural ODE and neural SDE negligible.
Convex Duality of Deep Neural Networks
We study regularized deep neural networks and introduce an analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weight matrices for a norm regularized deep neural network training problem can be explicitly found as the extreme points of a convex set. For two-layer linear networks, we first formulate a convex dual program and prove that strong duality holds. We then extend our derivations to prove that strong duality also holds for certain deep networks. In particular, for linear deep networks, we show that each optimal layer weight matrix is rank-one and aligns with the previous layers when the network output is scalar. We also extend our analysis to the vector outputs and other convex loss functions. More importantly, we show that the same characterization can also be applied to deep ReLU networks with rank-one inputs, where we prove that strong duality still holds and optimal layer weight matrices are rank-one for scalar output networks. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. We then verify our theoretical results via several numerical experiments.
Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks
Sivamani, Kirthi Shankar, Sahay, Rajeev, Gamal, Aly El
Recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasi-imperceptible to humans. In this letter, we propose a novel method to detect adversarial inputs, by augmenting the main classification network with multiple binary detectors (observer networks) which take inputs from the hidden layers of the original network (convolutional kernel outputs) and classify the input as clean or adversarial. During inference, the detectors are treated as a part of an ensemble network and the input is deemed adversarial if at least half of the detectors classify it as so. The proposed method addresses the trade-off between accuracy of classification on clean and adversarial samples, as the original classification network is not modified during the detection process. The use of multiple observer networks makes attacking the detection mechanism non-trivial even when the attacker is aware of the victim classifier. We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset using the Fast Gradient Sign Attack in a semi-white box setup. The number of false positive detections is a mere 0.12% in the worst case scenario.
Improving the Tightness of Convex Relaxation Bounds for Training Certifiably Robust Classifiers
Zhu, Chen, Ni, Renkun, Chiang, Ping-yeh, Li, Hengduo, Huang, Furong, Goldstein, Tom
Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness. In principle, convex relaxation can provide tight bounds if the solution to the relaxed problem is feasible for the original non-convex problem. We propose two regularizers that can be used to train neural networks that yield tighter convex relaxation bounds for robustness. In all of our experiments, the proposed regularizers result in higher certified accuracy than non-regularized baselines.
Longitudinal Support Vector Machines for High Dimensional Time Series
Pelckmans, Kristiaan, Zeng, Hong-Li
We consider the problem of learning a classifier from observed functional data. Here, each data-point takes the form of a single time-series and contains numerous features. Assuming that each such series comes with a binary label, the problem of learning to predict the label of a new coming time-series is considered. Hereto, the notion of {\em margin} underlying the classical support vector machine is extended to the continuous version for such data. The longitudinal support vector machine is also a convex optimization problem and its dual form is derived as well. Empirical results for specified cases with significance tests indicate the efficacy of this innovative algorithm for analyzing such long-term multivariate data.
Differentially Private Set Union
Gopi, Sivakanth, Gulhane, Pankaj, Kulkarni, Janardhan, Shen, Judy Hanwen, Shokouhi, Milad, Yekhanin, Sergey
We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.
VFlow: More Expressive Generative Flows with Variational Data Augmentation
Chen, Jianfei, Lu, Cheng, Chenli, Biqi, Zhu, Jun, Tian, Tian
Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations. However, tractability imposes architectural constraints on generative flows, making them less expressive than other types of generative models. In this work, we study a previously overlooked constraint that all the intermediate representations must have the same dimensionality with the original data due to invertibility, limiting the width of the network. We tackle this constraint by augmenting the data with some extra dimensions and jointly learning a generative flow for augmented data as well as the distribution of augmented dimensions under a variational inference framework. Our approach, VFlow, is a generalization of generative flows and therefore always performs better. Combining with existing generative flows, VFlow achieves a new state-of-the-art 2.98 bits per dimension on the CIFAR-10 dataset and is more compact than previous models to reach similar modeling quality.
Amortised Learning by Wake-Sleep
Wenliang, Li K., Moskovitz, Theodore, Kanagawa, Heishiro, Sahani, Maneesh
Models that employ latent variables to capture structure in observed data lie at the heart of many current unsupervised learning algorithms, but exact maximum-likelihood learning for powerful and flexible latent-variable models is almost always intractable. Thus, state-of-the-art approaches either abandon the maximum-likelihood framework entirely, or else rely on a variety of variational approximations to the posterior distribution over the latents. Here, we propose an alternative approach that we call amortised learning. Rather than computing an approximation to the posterior over latents, we use a wake-sleep Monte-Carlo strategy to learn a function that directly estimates the maximum-likelihood parameter updates. Amortised learning is possible whenever samples of latents and observations can be simulated from the generative model, treating the model as a "black box". We demonstrate its effectiveness on a wide range of complex models, including those with latents that are discrete or supported on non-Euclidean spaces.
Partially Observed Dynamic Tensor Response Regression
Zhou, Jie, Sun, Will Wei, Zhang, Jingfei, Li, Lexin
In modern data science, dynamic tensor data is prevailing in numerous applications. An important task is to characterize the relationship between such dynamic tensor and external covariates. However, the tensor data is often only partially observed, rendering many existing methods inapplicable. In this article, we develop a regression model with partially observed dynamic tensor as the response and external covariates as the predictor. We introduce the low-rank, sparsity and fusion structures on the regression coefficient tensor, and consider a loss function projected over the observed entries. We develop an efficient non-convex alternating updating algorithm, and derive the finite-sample error bound of the actual estimator from each step of our optimization algorithm. Unobserved entries in tensor response have imposed serious challenges. As a result, our proposal differs considerably in terms of estimation algorithm, regularity conditions, as well as theoretical properties, compared to the existing tensor completion or tensor response regression solutions. We illustrate the efficacy of our proposed method using simulations, and two real applications, a neuroimaging dementia study and a digital advertising study.