Not enough data to create a plot.
Try a different view from the menu above.
Yang, Yunfei
Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions
Yang, Yunfei, Chen, Xiaojun, Xuan, Yuexin, Zhao, Zhendong
Model stealing attack is increasingly threatening the confidentiality of machine learning models deployed in the cloud. Recent studies reveal that adversaries can exploit data synthesis techniques to steal machine learning models even in scenarios devoid of real data, leading to data-free model stealing attacks. Existing defenses against such attacks suffer from limitations, including poor effectiveness, insufficient generalization ability, and low comprehensiveness. In response, this paper introduces a novel defense framework named Model-Guardian. Comprising two components, Data-Free Model Stealing Detector (DFMS-Detector) and Deceptive Predictions (DPreds), Model-Guardian is designed to address the shortcomings of current defenses with the help of the artifact properties of synthetic samples and gradient representations of samples. Extensive experiments on seven prevalent data-free model stealing attacks showcase the effectiveness and superior generalization ability of Model-Guardian, outperforming eleven defense methods and establishing a new state-of-the-art performance. Notably, this work pioneers the utilization of various GANs and diffusion models for generating highly realistic query samples in attacks, with Model-Guardian demonstrating accurate detection capabilities.
On the rates of convergence for learning with convolutional neural networks
Yang, Yunfei, Feng, Han, Zhou, Ding-Xuan
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. Our first result proves a new approximation bound for CNNs with certain constraint on the weights. Our second result gives new analysis on the covering number of feed-forward neural networks with CNNs as special cases. The analysis carefully takes into account the size of the weights and hence gives better bounds than the existing literature in some situations. Using these two results, we are able to derive rates of convergence for estimators based on CNNs in many learning problems. In particular, we establish minimax optimal convergence rates of the least squares based on CNNs for learning smooth functions in the nonparametric regression setting. For binary classification, we derive convergence rates for CNN classifiers with hinge loss and logistic loss. It is also shown that the obtained rates for classification are minimax optimal in some common settings.
Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression
Yang, Yunfei, Zhou, Ding-Xuan
We study the approximation capacity of some variation spaces corresponding to shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth functions are contained in these spaces with finite variation norms. For functions with less smoothness, the approximation rates in terms of the variation norm are established. Using these results, we are able to prove the optimal approximation rates in terms of the number of neurons for shallow ReLU$^k$ neural networks. It is also shown how these results can be used to derive approximation bounds for deep neural networks and convolutional neural networks (CNNs). As applications, we study convergence rates for nonparametric regression using three ReLU neural network models: shallow neural network, over-parameterized neural network, and CNN. In particular, we show that shallow neural networks can achieve the minimax optimal rates for learning H\"older functions, which complements recent results for deep neural networks. It is also proven that over-parameterized (deep or shallow) neural networks can achieve nearly optimal rates for nonparametric regression.
Universality and approximation bounds for echo state networks with random weights
Li, Zhen, Yang, Yunfei
We study the uniform approximation of echo state networks with randomly generated internal weights. These models, in which only the readout weights are optimized during training, have made empirical success in learning dynamical systems. Recent results showed that echo state networks with ReLU activation are universal. In this paper, we give an alternative construction and prove that the universality holds for general activation functions. Specifically, our main result shows that, under certain condition on the activation function, there exists a sampling procedure for the internal weights so that the echo state network can approximate any continuous casual time-invariant operators with high probability. In particular, for ReLU activation, we give explicit construction for these sampling procedures. We also quantify the approximation error of the constructed ReLU echo state networks for sufficiently regular operators.
Nonparametric regression using over-parameterized shallow ReLU neural networks
Yang, Yunfei, Zhou, Ding-Xuan
It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
Convergence Analysis of the Deep Galerkin Method for Weak Solutions
Jiao, Yuling, Lai, Yanming, Wang, Yang, Yang, Haizhao, Yang, Yunfei
This paper analyzes the convergence rate of a deep Galerkin method for the weak solution (DGMW) of second-order elliptic partial differential equations on $\mathbb{R}^d$ with Dirichlet, Neumann, and Robin boundary conditions, respectively. In DGMW, a deep neural network is applied to parametrize the PDE solution, and a second neural network is adopted to parametrize the test function in the traditional Galerkin formulation. By properly choosing the depth and width of these two networks in terms of the number of training samples $n$, it is shown that the convergence rate of DGMW is $\mathcal{O}(n^{-1/d})$, which is the first convergence result for weak solutions. The main idea of the proof is to divide the error of the DGMW into an approximation error and a statistical error. We derive an upper bound on the approximation error in the $H^{1}$ norm and bound the statistical error via Rademacher complexity.
Approximation bounds for norm constrained neural networks with applications to regression and GANs
Jiao, Yuling, Wang, Yang, Yang, Yunfei
This paper studies the approximation capacity of ReLU neural networks with norm constraint on the weights. We prove upper and lower bounds on the approximation error of these networks for smooth function classes. The lower bound is derived through the Rademacher complexity of neural networks, which may be of independent interest. We apply these approximation bounds to analyze the convergence of regression using norm constrained neural networks and distribution estimation by GANs. In particular, we obtain convergence rates for over-parameterized neural networks. It is also shown that GANs can achieve optimal rate of learning probability distributions, when the discriminator is a properly chosen norm constrained neural network.
Non-Asymptotic Error Bounds for Bidirectional GANs
Liu, Shiao, Yang, Yunfei, Huang, Jian, Jiao, Yuling, Wang, Yang
We derive nearly sharp bounds for the bidirectional GAN (BiGAN) estimation error under the Dudley distance between the latent joint distribution and the data joint distribution with appropriately specified architecture of the neural networks used in the model. To the best of our knowledge, this is the first theoretical guarantee for the bidirectional GAN learning approach. An appealing feature of our results is that they do not assume the reference and the data distributions to have the same dimensions or these distributions to have bounded support. These assumptions are commonly assumed in the existing convergence analysis of the unidirectional GANs but may not be satisfied in practice. Our results are also applicable to the Wasserstein bidirectional GAN if the target distribution is assumed to have a bounded support. To prove these results, we construct neural network functions that push forward an empirical distribution to another arbitrary empirical distribution on a possibly different-dimensional space. We also develop a novel decomposition of the integral probability metric for the error analysis of bidirectional GANs. These basic theoretical results are of independent interest and can be applied to other related learning problems.
An error analysis of generative adversarial networks for learning distributions
Huang, Jian, Jiao, Yuling, Li, Zhen, Liu, Shiao, Wang, Yang, Yang, Yunfei
This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through H\"older classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have H\"older densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest.
On the capacity of deep generative networks for approximating distributions
Yang, Yunfei, Li, Zhen, Wang, Yang
We study the efficacy and efficiency of deep generative networks for approximating probability distributions. We prove that neural networks can transform a one-dimensional source distribution to a distribution that is arbitrarily close to a high-dimensional target distribution in Wasserstein distances. Upper bounds of the approximation error are obtained in terms of neural networks' width and depth. It is shown that the approximation error grows at most linearly on the ambient dimension and that the approximation order only depends on the intrinsic dimension of the target distribution. On the contrary, when $f$-divergences are used as metrics of distributions, the approximation property is different. We prove that in order to approximate the target distribution in $f$-divergences, the dimension of the source distribution cannot be smaller than the intrinsic dimension of the target distribution. Therefore, $f$-divergences are less adequate than Waserstein distances as metrics of distributions for generating samples.