Country
EvAn: Neuromorphic Event-based Anomaly Detection
Annamalai, Lakshmi, Chakraborty, Anirban, Thakur, Chetan Singh
Abstract--Event-based cameras are bio-inspired novel sensors that asynchronously record changes in illumination in the form of events, thus resulting in significant advantages over conventional cameras in terms of low power utilization, high dynamic range, and no motion blur. Moreover, such cameras, by design, encode only the relative motion between the scene and the sensor (and not the static background) to yield a very sparse data structure, which can be utilized for various motion analytics tasks. We propose to model the motion dynamics in the event domain with dual discriminator conditional Generative adversarial Network (cGAN) built on state-of-the-art architectures. T o adapt event data for using as input to cGAN, we also put forward a deep learning solution to learn a novel representation of event data, which retains the sparsity of the data as well as encode the temporal information readily available from these sensors. Since there is no existing dataset for anomaly detection in event domain, we also provide an anomaly detection event dataset with an exhaustive set of anomalies. Index Terms --Neuromorphic Camera, Event data, Anomaly Detection, Generative Adversarial Network.null 1 I NTRODUCTION This paper focusses on anomaly detection using bio-inspired event-based cameras that register pixel-wise changes in brightness asynchronously in an efficient manner, which is radically different from how a conventional camera works. The asynchronous principle of operation endows event cameras [9] [10] [36] [41] to capture high-speed motions (with temporal resolution in the order of ยตs), high dynamic range ( 140 db) and sparse data. These low latency sensors have paved way to develop agile robotic applications [1], which was not feasible with conventional cameras.
Communication-Efficient and Byzantine-Robust Distributed Learning
Ghosh, Avishek, Maity, Raj Kumar, Kadhe, Swanand, Mazumdar, Arya, Ramchandran, Kannan
We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of {\delta}-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor {\delta} is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.
Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller
Sharma, Pratyusha, Pathak, Deepak, Gupta, Abhinav
We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective. To accomplish this goal, our agent should not only learn to understand the intent of the demonstrated third-person video in its context but also perform the intended task in its environment configuration. Our central insight is to enforce this structure explicitly during learning by decoupling what to achieve (intended task) from how to perform it (controller). We propose a hierarchical setup where a high-level module learns to generate a series of first-person sub-goals conditioned on the third-person video demonstration, and a low-level controller predicts the actions to achieve those sub-goals. Our agent acts from raw image observations without any access to the full state information. We show results on a real robotic platform using Baxter for the manipulation tasks of pouring and placing objects in a box. Project video and code are at https://pathak22.github.io/hierarchical-imitation/
Regularizing Neural Networks by Stochastically Training Layer Ensembles
Labach, Alex, Valaee, Shahrokh
REGULARIZING NEURAL NETWORKS BY STOCHASTICALL Y TRAINING LA YER ENSEMBLES Alex Labach and Shahrokh V alaee University of Toronto Department of Electrical and Computer Engineering Toronto, Canada ABSTRACT Dropout and similar stochastic neural network regularization methods are often interpreted as implicitly averaging over a large ensemble of models. We propose STE (stochastically trained ensemble) layers, which enhance the averaging properties of such methods by training an ensemble of weight matrices with stochastic regularization while explicitly averaging outputs. This provides stronger regularization with no additional computational cost at test time. We show consistent improvement on various image classification tasks using standard network topologies. Index T erms-- neural networks, regularization, dropout, model averaging, ensemble methods 1. INTRODUCTION In order to generalize well to new inputs, modern deep neural networks require heavy regularization.
Generalizing Information to the Evolution of Rational Belief
Duersch, Jed A., Catanach, Thomas A.
Information theory provides a mathematical foundation to measure uncertainty in belief. Belief is represented by a probability distribution that captures our understanding of an outcome's plausibility. Information measures based on Shannon's concept of entropy include realization information, Kullback-Leibler divergence, Lindley's information in experiment, cross entropy, and mutual information. We derive a general theory of information from first principles that accounts for evolving belief and recovers all of these measures. Rather than simply gauging uncertainty, information is understood in this theory to measure change in belief. We may then regard entropy as the information we expect to gain upon realization of a discrete latent random variable. This theory of information is compatible with the Bayesian paradigm in which rational belief is updated as evidence becomes available. Furthermore, this theory admits novel measures of information with well-defined properties, which we explore in both analysis and experiment. This view of information illuminates the study of machine learning by allowing us to quantify information captured by a predictive model and distinguish it from residual information contained in training data. We gain related insights regarding feature selection, anomaly detection, and novel Bayesian approaches.
Continual Learning with Adaptive Weights (CLAW)
Adel, Tameem, Zhao, Han, Turner, Richard E.
Approaches to continual learning aim to successfully learn a set of related tasks that arrive in an online manner. Recently, several frameworks have been developed which enable deep learning to be deployed in this learning scenario. A key modelling decision is to what extent the architecture should be shared across tasks. On the one hand, separately modelling each task avoids catastrophic forgetting but it does not support transfer learning and leads to large models. On the other hand, rigidly specifying a shared component and a task-specific part enables task transfer and limits the model size, but it is vulnerable to catastrophic forgetting and restricts the form of task-transfer that can occur. Ideally, the network should adaptively identify which parts of the network to share in a data driven way. Here we introduce such an approach called Continual Learning with Adaptive Weights (CLAW), which is based on probabilistic modelling and variational inference. Experiments show that CLAW achieves state-of-the-art performance on six benchmarks in terms of overall continual learning performance, as measured by classification accuracy, and in terms of addressing catastrophic forgetting.
A Comparative Analysis of Forecasting Financial Time Series Using ARIMA, LSTM, and BiLSTM
Siami-Namini, Sima, Tavakoli, Neda, Namin, Akbar Siami
Machine and deep learning-based algorithms are the emerging approaches in addressing prediction problems in time series. These techniques have been shown to produce more accurate results than conventional regression-based modeling. It has been reported that artificial Recurrent Neural Networks (RNN) with memory, such as Long Short-Term Memory (LSTM), are superior compared to Autoregressive Integrated Moving Average (ARIMA) with a large margin. The LSTM-based models incorporate additional "gates" for the purpose of memorizing longer sequences of input data. The major question is that whether the gates incorporated in the LSTM architecture already offers a good prediction and whether additional training of data would be necessary to further improve the prediction. Bidirectional LSTMs (BiLSTMs) enable additional training by traversing the input data twice (i.e., 1) left-to-right, and 2) right-to-left). The research question of interest is then whether BiLSTM, with additional training capability, outperforms regular unidirectional LSTM. This paper reports a behavioral analysis and comparison of BiLSTM and LSTM models. The objective is to explore to what extend additional layers of training of data would be beneficial to tune the involved parameters. The results show that additional training of data and thus BiLSTM-based modeling offers better predictions than regular LSTM-based models. More specifically, it was observed that BiLSTM models provide better predictions compared to ARIMA and LSTM models. It was also observed that BiLSTM models reach the equilibrium much slower than LSTM-based models.
Safe Linear Stochastic Bandits
We introduce the safe linear stochastic bandit framework-- a generalization of linear stochastic bandits--where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe) threshold with high probability. We assume that the learner initially has knowledge of an arm that is known to be safe, but not necessarily optimal. Leveraging on this assumption, we introduce a learning algorithm that systematically combines known safe arms with exploratory arms to safely expand the set of safe arms over time, while facilitating safe greedy exploitation in subsequent stages. In addition to ensuring the satisfaction of the safety constraint at every stage of play, the proposed algorithm is shown to exhibit an expected regret that is no more than O ( T log( T)) after T stages of play. 1 Introduction We investigate the role of safety in constraining the design of learning algorithms within the classical framework of linear stochastic bandits (Dani, Hayes, and Kakade 2008; Rusmevichientong and Tsitsiklis 2010; Abbasi-Y adkori, P al, and Szepesv ari 2011). Specifically, we introduce a family of safe linear stochastic bandit problems where--in addition to the typical goal of designing learning algorithms that minimize regret--we impose a constraint requiring that an algorithm's stagewise expected reward remains above a predetermined safety threshold with high probability at every stage of play. In the proposed framework, we assume that a "safe" baseline arm is initially known, and consider a class of safety thresholds that are defined as fixed cutbacks on the expected reward of the known baseline arm. Accordingly, an algorithm that is deemed to be safe cannot induce stage-wise rewards that dip below the baseline reward by more than a fixed amount. Critically, the assumption of a known baseline arm--and the limited capacity for exploration implied by the class of safety thresholds considered--can be leveraged on to initially guide the exploration of allowable arms by playing combinations of the baseline arm and exploratory arms in a manner that expands the set of safe arms over time, while simultaneously preserving safety at every stage of play. There are a variety of real-world applications that might benefit from the design of stagewise-safe online learning algorithms (Khezeli and Bitar 2017; Li et al. 2019; Sui et al. 2015). Most prominently, clinical trials have long been used as a motivating application for the multi-armed bandit (Berry and Pearson 1985) and linear bandit (Dani, Hayes, and Kakade 2008) frameworks.
Few Shot Network Compression via Cross Distillation
Bai, Haoli, Wu, Jiaxiang, King, Irwin, Lyu, Michael
Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances. The estimation errors could propagate and accumulate layer-wisely and finally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced. The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments on benchmark datasets demonstrate that cross distillation can significantly improve the student network's accuracy when only a few training instances are available.
Approximated Orthonormal Normalisation in Training Neural Networks
Zhang, Guoqiang, Niwa, Kenta, Kleijn, W. B.
Generalisation of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named approximated orthonormal normalisation (AON), to improve the generalisation capacity of a DNN model. Considering a weight matrix W from a particular neural layer in the model, our objective is to design a function h(W) such that its row vectors are approximately orthogonal to each other while allowing the DNN model to fit the training data sufficiently accurate. By doing so, it would avoid co-adaptation among neurons of the same layer to be able to improve network-generalisation capacity. Specifically, at each iteration, we first approximate (WW^T)^(-1/2) using its Taylor expansion before multiplying the matrix W. After that, the matrix product is then normalised by applying the spectral normalisation (SN) technique to obtain h(W). Conceptually speaking, AON is designed to turn orthonormal regularisation into orthonormal normalisation to avoid manual balancing the original and penalty functions. Experimental results show that AON yields promising validation performance compared to orthonormal regularisation.