Goto

Collaborating Authors

Results


BASGD: Buffered Asynchronous SGD for Byzantine Learning

arXiv.org Machine Learning

Distributed learning has become a hot research topic, due to its wide application in cluster-based large-scale learning, federated learning, edge computing and so on. Most distributed learning methods assume no error and attack on the workers. However, many unexpected cases, such as communication error and even malicious attack, may happen in real applications. Hence, Byzantine learning (BL), which refers to distributed learning with attack or error, has recently attracted much attention. Most existing BL methods are synchronous, which will result in slow convergence when there exist heterogeneous workers. Furthermore, in some applications like federated learning and edge computing, synchronization cannot even be performed most of the time due to the online workers (clients or edge servers). Hence, asynchronous BL (ABL) is more general and practical than synchronous BL (SBL). To the best of our knowledge, there exist only two ABL methods. One of them cannot resist malicious attack. The other needs to store some training instances on the server, which has the privacy leak problem. In this paper, we propose a novel method, called buffered asynchronous stochastic gradient descent (BASGD), for BL. BASGD is an asynchronous method. Furthermore, BASGD has no need to store any training instances on the server, and hence can preserve privacy in ABL. BASGD is theoretically proved to have the ability of resisting against error and malicious attack. Moreover, BASGD has a similar theoretical convergence rate to that of vanilla asynchronous SGD (ASGD), with an extra constant variance. Empirical results show that BASGD can significantly outperform vanilla ASGD and other ABL baselines, when there exists error or attack on workers.


Estimating Training Data Influence by Tracking Gradient Descent

arXiv.org Machine Learning

We introduce a method called TrackIn that computes the influence of a training example on a prediction made by the model, by tracking how the loss on the test point changes during the training process whenever the training example of interest was utilized. We provide a scalable implementation of TrackIn via a combination of a few key ideas: (a) a first-order approximation to the exact computation, (b) using random projections to speed up the computation of the first-order approximation for large models, (c) using saved checkpoints of standard training procedures, and (d) cherry-picking layers of a deep neural network. An experimental evaluation shows that TrackIn is more effective in identifying mislabelled training examples than other related methods such as influence functions and representer points. We also discuss insights from applying the method on vision, regression and natural language tasks.


Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Neural Information Processing Systems

Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives---optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications. Papers published at the Neural Information Processing Systems Conference.


FIS-GAN: GAN with Flow-based Importance Sampling

arXiv.org Machine Learning

Generative Adversarial Networks (GAN) training process, in most cases, apply uniform and Gaussian sampling methods in latent space, which probably spends most of the computation on examples that can be properly handled and easy to generate. Theoretically, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training examples. In this paper, we explore the possibility for adapting importance sampling into adversarial learning. We use importance sampling to replace uniform and Gaussian sampling methods in latent space and combine normalizing flow with importance sampling to approximate latent space posterior distribution by density estimation. Empirically, results on MNIST and Fashion-MNIST demonstrate that our method significantly accelerates the convergence of generative process while retaining visual fidelity in generated samples.


A Mean-Field Theory for Kernel Alignment with Random Features in Generative Adversarial Networks

arXiv.org Machine Learning

We propose a novel supervised learning method to optimize the kernel in maximum mean discrepancy generative adversarial networks (MMD GANs). Specifically, we characterize a distributionally robust optimization problem to compute a good distribution for the random feature model of Rahimi and Recht to approximate a good kernel function. Due to the fact that the distributional optimization is infinite dimensional, we consider a Monte-Carlo sample average approximation (SAA) to obtain a more tractable finite dimensional optimization problem. We subsequently leverage a particle stochastic gradient descent (SGD) method to solve finite dimensional optimization problems. Based on a mean-field analysis, we then prove that the empirical distribution of the interactive particles system at each iteration of the SGD follows the path of the gradient descent flow on the Wasserstein manifold. We also establish the non-asymptotic consistency of the finite sample estimator. Our empirical evaluation on synthetic data-set as well as MNIST and CIFAR-10 benchmark data-sets indicates that our proposed MMD GAN model with kernel learning indeed attains higher inception scores well as Fr\`{e}chet inception distances and generates better images compared to the generative moment matching network (GMMN) and MMD GAN with untrained kernels.


SaaS: Speed as a Supervisor for Semi-supervised Learning

arXiv.org Machine Learning

We introduce the SaaS Algorithm for semi-supervised learning, which uses learning speed during stochastic gradient descent in a deep neural network to measure the quality of an iterative estimate of the posterior probability of unknown labels. Training speed in supervised learning correlates strongly with the percentage of correct labels, so we use it as an inference criterion for the unknown labels, without attempting to infer the model parameters at first. Despite its simplicity, SaaS achieves state-of-the-art results in semi-supervised learning benchmarks.


Deeply Learning the Messages in Message Passing Inference

Neural Information Processing Systems

Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to directly estimate the messages in message passing inference for structured prediction with Conditional Random Fields CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension of message estimators is the same as the number of classes, rather than exponentially growing in the order of the potentials. Hence it is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation and achieve impressive performance, which demonstrates the effectiveness and usefulness of our CNN message learning method.


Deeply Learning the Messages in Message Passing Inference

arXiv.org Machine Learning

Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension for message estimation is the same as the number of classes, in contrast to the network output for general CNN potential functions in CRFs, which is exponential in the order of the potentials. Hence CNN message learning has fewer network parameters and is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation on the PASCAL VOC 2012 dataset. We achieve an intersection-over-union score of 73.4 on its test set, which is the best reported result for methods using the VOC training images alone. This impressive performance demonstrates the effectiveness and usefulness of our CNN message learning method.