Country
Federated Learning with Bayesian Differential Privacy
Triastcyn, Aleksei, Faltings, Boi
--We consider the problem of reinforcing federated learning with formal privacy guarantees. We propose to employ Bayesian differential privacy, a relaxation of differential privacy for similarly distributed data, to provide sharper privacy loss bounds. We adapt the Bayesian privacy accounting method to the federated setting and suggest multiple improvements for more efficient privacy budgeting at different levels. Our experiments show significant advantage over the state-of-the-art differential privacy bounds for federated learning on image classification tasks, including a medical application, bringing the privacy budget below ฮต 1 at the client level, and below ฮต 0 .1 at the instance level. Lower amounts of noise also benefit the model accuracy and reduce the number of communication rounds. I NTRODUCTION The rise of data analytics and machine learning (ML) presents countless opportunities for companies, governments and individuals to benefit from the accumulated data. At the same time, their ability to capture fine levels of detail potentially compromises privacy of data providers. Recent research [1], [2] suggests that even in a black-box setting it is possible to argue about the presence of individual records in the training set or recover certain features of these records. To tackle this problem a number of solutions has been proposed. They vary in how privacy is achieved and to what extent data is protected. One approach that assumes privacy at its core is federated learning (FL) [3]. In the FL setting, a central entity ( server) trains a model on user data without actually copying data from user devices. Instead, users ( clients) update models locally, and the server aggregates these updates. In spite of all the advantages, federated learning does not provide theoretical privacy guarantees, like it is done by differential privacy (DP) [4], which is viewed by many researchers as the privacy gold standard.
Low-variance Black-box Gradient Estimates for the Plackett-Luce Distribution
Gadetsky, Artyom, Struminsky, Kirill, Robinson, Christopher, Quadrianto, Novi, Vetrov, Dmitry
Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.
Direct Classification of Type 2 Diabetes From Retinal Fundus Images in a Population-based Sample From The Maastricht Study
Heslinga, Friso G., Pluim, Josien P. W., Houben, A. J. H. M., Schram, Miranda T., Henry, Ronald M. A., Stehouwer, Coen D. A., van Greevenbroek, Marleen J., Berendschot, Tos T. J. M., Veta, Mitko
Type 2 Diabetes (T2D) is a chronic metabolic disorder that can lead to blindness and cardiovascular disease. Information about early stage T2D might be present in retinal fundus images, but to what extent these images can be used for a screening setting is still unknown. In this study, deep neural networks were employed to differentiate between fundus images from individuals with and without T2D. We investigated three methods to achieve high classification performance, measured by the area under the receiver operating curve (ROC-AUC). A multi-target learning approach to simultaneously output retinal biomarkers as well as T2D works best (AUC = 0.746 [$\pm$0.001]). Furthermore, the classification performance can be improved when images with high prediction uncertainty are referred to a specialist. We also show that the combination of images of the left and right eye per individual can further improve the classification performance (AUC = 0.758 [$\pm$0.003]), using a simple averaging approach. The results are promising, suggesting the feasibility of screening for T2D from retinal fundus images.
Maximum Entropy Models from Phase Harmonic Covariances
Zhang, Sixin, Mallat, Stรฉphane
Maximum Entropy Models from Phase Harmonic Covariances Sixin Zhang 1, 4, St ephane Mallat 1, 2,3 1 ENS, PSL University, Paris, France 2 Coll ege de France, Paris, France 3 Flatiron Institute, New York, USA 4 Center for Data Science, Peking University, Beijing, China November 25, 2019 Abstract We define maximum entropy models of non-Gaussian stationary random vectors from covariances of nonlinear representations. These representations are calculated by multiplying the phase of Fourier or wavelet coefficients with harmonic integers, which amounts to compute a windowed Fourier transform along their phase. Rectifiers in neural networks compute such phase windowing. The covariance of these harmonic coefficients capture dependencies of Fourier and wavelet coefficients across frequencies, by canceling their random phase. We introduce maximum entropy models conditioned by such covariances over a graph of local interactions. These models are approximated by transporting an initial maximum ...
Attack Agnostic Statistical Method for Adversarial Detection
Saha, Sambuddha, Kumar, Aashish, Sahay, Pratyush, Jose, George, Kruthiventi, Srinivas, Muralidhara, Harikrishna
Deep Learning based AI systems have shown great promise in various domains such as vision, audio, autonomous systems (vehicles, drones), etc. Recent research on neural networks has shown the susceptibility of deep networks to adversarial attacks - a technique of adding small perturbations to the inputs which can fool a deep network into misclassifying them. Developing defenses against such adversarial attacks is an active research area, with some approaches proposing robust models that are immune to such adversaries, while other techniques attempt to detect such adversarial inputs. In this paper, we present a novel statistical approach for adversarial detection in image classification. Our approach is based on constructing a per-class feature distribution and detecting adversaries based on comparison of features of a test image with the feature distribution of its class. For this purpose, we make use of various statistical distances such as ED (Energy Distance), MMD (Maximum Mean Discrepancy) for adversarial detection, and analyze the performance of each metric. We experimentally show that our approach achieves good adversarial detection performance on MNIST and CIFAR-10 datasets irrespective of the attack method, sample size and the degree of adversarial perturbation.
Instance Cross Entropy for Deep Metric Learning
Wang, Xinshao, Kodirov, Elyor, Hua, Yang, Robertson, Neil
Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its ground-truth one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples' gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE.
Supervised and Semi-supervised Deep Learning-based Models for Indoor Location Prediction and Recognition
Qian, Weizhu, Lauri, Fabrice, Gechter, Franck
Bourgogne Franche-Comt e UTBM, F-90010, Belfort, France ABSTRACT Predicting smartphone users location with WiFi fingerprints has been a popular research topic recently. In this work, we propose two novel deep learning-based models, the con-volutional mixture density recurrent neural network and the V AE-based semi-supervised learning model. The convolu-tional mixture density recurrent neural network is designed for path prediction, in which the advantages of convolutional neural networks, recurrent neural networks and mixture density networks are combined. Further, since most of real-world datasets are not labeled, we devise the V AE-based model for the semi-supervised learning tasks. In order to test the proposed models, we conduct the validation experiments on the real-world datasets. The final results verify the effectiveness of our approaches and show the superiority over other existing methods. Index T erms-- Mixture density network, variational au-toencoder, semi-supervised learning, WiFi fingerprint, indoor positioning 1. INTRODUCTION Location based services (LBS) are essential for applications like location-based advertising, outdoor/indoor navigation and social networking, etc. With the help of significant advancement of the smartphone technology in recent decades, smartphone devices are integrated with various built-in sensors, such as GPS modules, WiFi modules, cellular modules, etc. Acquiring the data from such kinds of sensors enables researchers to study human activities. There are several types of data can be utilised for such research purpose.
Noise Induces Loss Discrepancy Across Groups for Linear Regression
This loss discrepancy across groups is especially problematic in critical applications that impact people's lives (Berk, 2012; Chouldechova, 2017). Despite the vast literature on removing loss discrepancy (Hardt et al., 2016; Khani et al., 2019; Agarwal et al., 2018; Zafar et al., 2017), the direct removal of loss discrepancy might introduce other problems such as intragroup loss discrepancy (Lipton et al., 2018) and adverse long-term impacts (Liu et al., 2018). Therefore, it is important to understand the source of loss discrepancy. Why do such loss discrepancies exist? The literature generally studies sources of loss discrepancy due to an "information deficiency" of one group--that is, one group has, for example, more noise (Corbett-Davies et al., 2017), lessPreliminary work, under review.
Neural Networks Learning and Memorization with (almost) no Over-Parameterization
Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with near optimal network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\mathbb{S}^{d-1}$.
Data Programming using Continuous and Quality-Guided Labeling Functions
Chatterjee, Oishik, Ramakrishnan, Ganesh, Sarawagi, Sunita
Sunita Sarawagi Department of CSE IIT Bombay, India sunita@iitb.ac.in Abstract Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set of discrete labeling functions (LF) that output possibly noisy labels to input instances and a generative model for consolidating the weak labels. We enhance and generalize this paradigm by supporting functions that output a continuous score (instead of a hard label) that noisily correlates with labels. We show across five applications that continuous LFs are more natural to program and lead to improved recall. We also show that accuracy of existing generative models is unstable with respect to initialization, training epochs, and learning rates. We give control to the data programmer to guide the training process by providing intuitive quality guides with each LF. We propose an elegant method of incorporating these guides into the generative model. Our overall method, called CAGE, makes the data programming paradigm more reliable than other tricks based on initialization, sign-penalties, or soft-accuracy constraints. 1 Introduction Modern machine learning systems require large amounts of labelled data. For many applications, such labelled data is created by getting humans to explicitly label each training example. A problem of perpetual interest in machine learning is reducing the tedium of such human supervision via techniques like active learning, crowd-labeling, distant supervision, and semi-supervised learning.