Country
Machine Unlearning: Linear Filtration for Logit-based Classifiers
Baumhauer, Thomas, Schöttle, Pascal, Zeppelzauer, Matthias
Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a "right to be forgotten". This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which could be broadly described as the investigation of how to "delete training data from models". Our work complements this direction of research for the specific setting of class-wide deletion requests for classification models (e.g. deep neural networks). As a first step, we propose linear filtration as a computationally efficient sanitization method. Our experiments demonstrate benefits in an adversarial setting over naive deletion schemes.
Unsupervised non-parametric change point detection in quasi-periodic signals
Shvetsov, Nikolay, Buzun, Nazar, Dylov, Dmitry V.
We propose a new unsupervised and non-parametric method to detect change points in intricate quasi-periodic signals. The detection relies on optimal transport theory combined with topological analysis and the bootstrap procedure. The algorithm is designed to detect changes in virtually any harmonic or a partially harmonic signal and is verified on three different sources of physiological data streams. We successfully find abnormal or irregular cardiac cycles in the waveforms for the six of the most frequent types of clinical arrhythmias using a single algorithm. The validation and the efficiency of the method are shown both on synthetic and on real time series. Our unsupervised approach reaches the level of performance of the supervised state-of-the-art techniques. We provide conceptual justification for the efficiency of the method and prove the convergence of the bootstrap procedure theoretically.
DynamicPPL: Stan-like Speed for Dynamic Probabilistic Models
Tarek, Mohamed, Xu, Kai, Trapp, Martin, Ge, Hong, Ghahramani, Zoubin
We present the preliminary high-level design and features of DynamicPPL.jl, a modular library providing a lightning-fast infrastructure for probabilistic programming. Besides a computational performance that is often close to or better than Stan, DynamicPPL provides an intuitive DSL that allows the rapid development of complex dynamic probabilistic programs. Being entirely written in Julia, a high-level dynamic programming language for numerical computing, DynamicPPL inherits a rich set of features available through the Julia ecosystem. Since DynamicPPL is a modular, stand-alone library, any probabilistic programming system written in Julia, such as Turing.jl, can use DynamicPPL to specify models and trace their model parameters. The main features of DynamicPPL are: 1) a meta-programming based DSL for specifying dynamic models using an intuitive tilde-based notation; 2) a tracing data-structure for tracking RVs in dynamic probabilistic models; 3) a rich contextual dispatch system allowing tailored behaviour during model execution; and 4) a user-friendly syntax for probabilistic queries. Finally, we show in a variety of experiments that DynamicPPL, in combination with Turing.jl, achieves computational performance that is often close to or better than Stan.
A novel initialisation based on hospital-resident assignment for the k-modes algorithm
Wilde, Henry, Knight, Vincent, Gillard, Jonathan
This paper presents a new way of selecting an initial solution for the k-modes algorithm that allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not. The method, which utilises the Hospital-Resident Assignment Problem to find the set of initial cluster centroids, is compared with the current initialisations on both benchmark datasets and a body of newly generated artificial datasets. Based on this analysis, the proposed method is shown to outperform the other initialisations in the majority of cases, especially when the number of clusters is optimised. In addition, we find that our method outperforms the leading established method specifically for low-density data.
Ready Policy One: World Building Through Active Learning
Ball, Philip, Parker-Holder, Jack, Pacchiano, Aldo, Choromanski, Krzysztof, Roberts, Stephen
Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks. However, many existing MBRL methods rely on combining greedy policies with exploration heuristics, and even those which utilize principled exploration bonuses construct dual objectives in an ad hoc fashion. In this paper we introduce Ready Policy One (RP1), a framework that views MBRL as an active learning problem, where we aim to improve the world model in the fewest samples possible. RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization, allowing the algorithm to trade off reward v.s. exploration at different stages of learning. In addition, we introduce a principled mechanism to terminate sample collection once we have a rich enough trajectory batch to improve the model. We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
Discretization and Machine Learning Approximation of BSDEs with a Constraint on the Gains-Process
Kharroubi, Idris, Lim, Thomas, Warin, Xavier
We study the approximation of backward stochastic differential equations (BSDEs for short) with a constraint on the gains process. We first discretize the constraint by applying a so-called facelift operator at times of a grid. We show that this discretely constrained BSDE converges to the continuously constrained one as the mesh grid converges to zero. We then focus on the approximation of the discretely constrained BSDE. For that we adopt a machine learning approach. We show that the facelift can be approximated by an optimization problem over a class of neural networks under constraints on the neural network and its derivative. We then derive an algorithm converging to the discretely constrained BSDE as the number of neurons goes to infinity. We end by numerical experiments. Mathematics Subject Classification (2010): 65C30, 65M75, 60H35, 93E20, 49L25.
Memory Augmented Generative Adversarial Networks for Anomaly Detection
Yang, Ziyi, Zhang, Teng, Bozchalooi, Iman Soltani, Darve, Eric
In this paper, we present a memory-augmented algorithm for anomaly detection. Classical anomaly detection algorithms focus on learning to model and generate normal data, but typically guarantees for detecting anomalous data are weak. The proposed Memory Augmented Generative Adversarial Networks (MEMGAN) interacts with a memory module for both the encoding and generation processes. Our algorithm is such that most of the \textit{encoded} normal data are inside the convex hull of the memory units, while the abnormal data are isolated outside. Such a remarkable property leads to good (resp.\ poor) reconstruction for normal (resp.\ abnormal) data and therefore provides a strong guarantee for anomaly detection. Decoded memory units in MEMGAN are more interpretable and disentangled than previous methods, which further demonstrates the effectiveness of the memory mechanism. Experimental results on twenty anomaly detection datasets of CIFAR-10 and MNIST show that MEMGAN demonstrates significant improvements over previous anomaly detection methods.
Short sighted deep learning
Koch, Ellen de Melllo, Koch, Anita de Mello, Kastanos, Nicholas, Cheng, Ling
A theory explaining how deep learning works is yet to be developed. Previous work suggests that deep learning performs a coarse graining, similar in spirit to the renormalization group (RG). This idea has been explored in the setting of a local (nearest neighbor interactions) Ising spin lattice. We extend the discussion to the setting of a long range spin lattice. Markov Chain Monte Carlo (MCMC) simulations determine both the critical temperature and scaling dimensions of the system. The model is used to train both a single RBM (restricted Boltzmann machine) network, as well as a stacked RBM network. Following earlier Ising model studies, the trained weights of a single layer RBM network define a flow of lattice models. In contrast to results for nearest neighbor Ising, the RBM flow for the long ranged model does not converge to the correct values for the spin and energy scaling dimension. Further, correlation functions between visible and hidden nodes exhibit key differences between the stacked RBM and RG flows. The stacked RBM flow appears to move towards low temperatures whereas the RG flow moves towards high temperature. This again differs from results obtained for nearest neighbor Ising.
The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks
Swiatkowski, Jakub, Roth, Kevin, Veeling, Bastiaan S., Tran, Linh, Dillon, Joshua V., Mandt, Stephan, Snoek, Jasper, Salimans, Tim, Jenatton, Rodolphe, Nowozin, Sebastian
Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.
Accelerating Deep Learning Inference via Freezing
Kumar, Adarsh, Balasubramanian, Arjun, Venkataraman, Shivaram, Akella, Aditya
Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% of CIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design.