Goto

Collaborating Authors

 approach infinity






On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times

Fu, Yao, Zhou, Yipeng, Wu, Di, Yu, Shui, Wen, Yonggang, Li, Chao

arXiv.org Artificial Intelligence

In spite that Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively, recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks. In the meanwhile, Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks. However, the adoption of DP by clients in FL may significantly jeopardize the model accuracy. It is still an open problem to understand the practicality of DP from a theoretic perspective. In this paper, we make the first attempt to understand the practicality of DP in FL through tuning the number of conducted iterations. Based on the FedAvg algorithm, we formally derive the convergence rate with DP noises in FL. Then, we theoretically derive: 1) the conditions for the DP based FedAvg to converge as the number of global iterations (GI) approaches infinity; 2) the method to set the number of local iterations (LI) to minimize the negative influence of DP noises. By further substituting the Laplace and Gaussian mechanisms into the derived convergence rate respectively, we show that: 3) The DP based FedAvg with the Laplace mechanism cannot converge, but the divergence rate can be effectively prohibited by setting the number of LIs with our method; 4) The learning error of the DP based FedAvg with the Gaussian mechanism can converge to a constant number finally if we use a fixed number of LIs per GI. To verify our theoretical findings, we conduct extensive experiments using two real-world datasets. The results not only validate our analysis results, but also provide useful guidelines on how to optimize model accuracy when incorporating DP into FL


A Look at the Effect of Sample Design on Generalization through the Lens of Spectral Analysis

Kailkhura, Bhavya, Thiagarajan, Jayaraman J., Li, Qunwei, Bremer, Peer-Timo

arXiv.org Machine Learning

This paper provides a general framework to study the effect of sampling properties of training data on the generalization error of the learned machine learning (ML) models. Specifically, we propose a new spectral analysis of the generalization error, expressed in terms of the power spectra of the sampling pattern and the function involved. The framework is build in the Euclidean space using Fourier analysis and establishes a connection between some high dimensional geometric objects and optimal spectral form of different state-of-the-art sampling patterns. Subsequently, we estimate the expected error bounds and convergence rate of different state-of-the-art sampling patterns, as the number of samples and dimensions increase. We make several observations about generalization error which are valid irrespective of the approximation scheme (or learning architecture) and training (or optimization) algorithms. Our result also sheds light on ways to formulate design principles for constructing optimal sampling methods for particular problems.


Comparison between the two definitions of AI

Dobrev, Dimiter

arXiv.org Artificial Intelligence

Two different definitions of the Artificial Intelligence concept have been proposed in papers [1] and [2]. The first definition is informal. It says that any program that is cleverer than a human being, is acknowledged as Artificial Intelligence. The second definition is formal because it avoids reference to the concept of human being. The readers of papers [1] and [2] might be left with the impression that both definitions are equivalent and the definition in [2] is simply a formal version of that in [1]. This paper will compare both definitions of Artificial Intelligence and, hopefully, will bring a better understanding of the concept.


The Capacity of a Bump

Flake, Gary William

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.


The Capacity of a Bump

Flake, Gary William

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.


The Capacity of a Bump

Flake, Gary William

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results whenusing Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.