learning distribution generated
Learning Distributions Generated by Single-Layer ReLU Networks in the Presence of Arbitrary Outliers
We consider a set of data samples such that a fraction of the samples are arbitrary outliers, and the rest are the output samples of a single-layer neural network with rectified linear unit (ReLU) activation. Our goal is to estimate the parameters (weight matrix and bias vector) of the neural network, assuming the bias vector to be non-negative. We estimate the network parameters using the gradient descent algorithm combined with either the median-or trimmed mean-based filters to mitigate the effect of the arbitrary outliers.
Reviews: Learning Distributions Generated by One-Layer ReLU Networks
A popular generative model these days is as follows: pass a standard Gaussian noise through a neural network. But a major unanswered question is what is the structure of the resulting distribution? Given samples from such a distribution, can we learn the distribution parameters? This question is the topic of this paper. Specifically, consider a 1-layer ReLU neural network, which is specified by a matrix W and a real bias b.
Learning Distributions Generated by One-Layer ReLU Networks
We consider the problem of estimating the parameters of a d -dimensional rectified Gaussian distribution from i.i.d. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error \eps orm{W}_F using \widetilde{O}(1/\eps 2) samples and \widetilde{O}(d 2/\eps 2) time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to \eps in total variation distance using \widetilde{O}(\kappa 2d 2/\eps 2) samples, where \kappa is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative.
Learning Distributions Generated by Single-Layer ReLU Networks in the Presence of Arbitrary Outliers
We consider a set of data samples such that a fraction of the samples are arbitrary outliers, and the rest are the output samples of a single-layer neural network with rectified linear unit (ReLU) activation. Our goal is to estimate the parameters (weight matrix and bias vector) of the neural network, assuming the bias vector to be non-negative. We estimate the network parameters using the gradient descent algorithm combined with either the median- or trimmed mean-based filters to mitigate the effect of the arbitrary outliers.
Learning Distributions Generated by One-Layer ReLU Networks
We consider the problem of estimating the parameters of a d -dimensional rectified Gaussian distribution from i.i.d. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error \eps orm{W}_F using \widetilde{O}(1/\eps 2) samples and \widetilde{O}(d 2/\eps 2) time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to \eps in total variation distance using \widetilde{O}(\kappa 2d 2/\eps 2) samples, where \kappa is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative.
Learning Distributions Generated by One-Layer ReLU Networks
Wu, Shanshan, Dimakis, Alexandros G., Sanghavi, Sujay
We consider the problem of estimating the parameters of a $d$-dimensional rectified Gaussian distribution from i.i.d. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error $\eps orm{W}_F$ using $\widetilde{O}(1/\eps 2)$ samples and $\widetilde{O}(d 2/\eps 2)$ time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to $\eps$ in total variation distance using $\widetilde{O}(\kappa 2d 2/\eps 2)$ samples, where $\kappa$ is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative.