Goto

Collaborating Authors

 gaussian marginal


Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Neural Information Processing Systems

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt < 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time.


Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

Neural Information Processing Systems

We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(\bx, y)$ from an unknown distribution on $\R^d \times \{ \pm 1\}$, whose marginal distribution on $\bx$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\opt+\eps$, where $\opt$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(\bx, y)$ from an unknown distribution on $\R^d \times \R$, whose marginal distribution on $\bx$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $\opt+\eps$, where $\opt$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{\poly(1/\eps)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.


Proximal Approximate Inference in State-Space Models

Abdulsamad, Hany, García-Fernández, Ángel F., Särkkä, Simo

arXiv.org Artificial Intelligence

We present a class of algorithms for state estimation in nonlinear, non-Gaussian state-space models. Our approach is based on a variational Lagrangian formulation that casts Bayesian inference as a sequence of entropic trust-region updates subject to dynamic constraints. This framework gives rise to a family of forward-backward algorithms, whose structure is determined by the chosen factorization of the variational posterior. By focusing on Gauss--Markov approximations, we derive recursive schemes with favorable computational complexity. For general nonlinear, non-Gaussian models we close the recursions using generalized statistical linear regression and Fourier--Hermite moment matching.



Reliable Learning of Halfspaces under Gaussian Marginals

Neural Information Processing Systems

We study the problem of PAC learning halfspaces in the reliable agnostic model of Kalai et al. (2012).The reliable PAC model captures learning scenarios where one type of error is costlier than the others. We complement our upper bound with a Statistical Query lower bound suggesting that the d {\Omega(\log (1/\alpha))} dependence is best possible. Conceptually, our results imply a strong computational separation between reliable agnostic learning and standard agnostic learning of halfspaces in the Gaussian setting.


Reviews: Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Neural Information Processing Systems

This paper studies the computational complexity of learning a single ReLU with respect to Gaussian examples. Since ReLUs are now the standard choice of nonlinearity in deep neural networks, the computational complexity of learning them is clearly of interest. Of course, the computational complexity of learning a ReLU may depend substantially on the specific setting assumed; it is interesting to understand the range of such assumptions and their implications for complexity. This paper studies the following setting: given independent samples (x_1,y_1), ..., (x_n,y_n) where x is spherical Gaussian in d dimensions and y \in R is arbitrary, find a ReLU function f_w(x) max(0, w \cdot x) for some vector w with minimal mean squared error \sum (y_i - f(x_i)) 2. (This is agnostic learning since the y's are arbitrary.) The main results are as follows: 1) There is no algorithm to learn a single ReLU with respect to Gaussian examples to additive error \epsilon in time d {o(log 1/\epsilon)} unless k -sparse parities with noise can be learned in time d {o(k)} 2) If opt min_{w} (mean squared error of f) then (with normalization such that opt \in [0,1]) there is an algorithm which agnostically learns a ReLU to error opt {2/3} \epsilon in time poly(d,1/\epsilon). The proof of (1) goes via Hermite analysis (i.e.


Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

Neural Information Processing Systems

We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples (\bx, y) from an unknown distribution on \R d \times \{ \pm 1\}, whose marginal distribution on \bx is the standard Gaussian and the labels y can be arbitrary, the goal is to output a hypothesis with 0-1 loss \opt \eps, where \opt is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples (\bx, y) from an unknown distribution on \R d \times \R, whose marginal distribution on \bx is the standard Gaussian and the labels y can be arbitrary, the goal is to output a hypothesis with square loss \opt \eps, where \opt is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of d {\poly(1/\eps)} for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.


Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Neural Information Processing Systems

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let \opt 1 be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss \opt \epsilon is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss.


Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals

Diakonikolas, Ilias, Kane, Daniel M., Ren, Lisheng

arXiv.org Artificial Intelligence

We study the task of agnostically learning halfspaces under the Gaussian distribution. Specifically, given labeled examples $(\mathbf{x},y)$ from an unknown distribution on $\mathbb{R}^n \times \{ \pm 1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. We prove a near-optimal computational hardness result for this task, under the widely believed sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to yield near-optimal lower bounds for related problems, including ReLU regression.


Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Goel, Surbhi, Karmalkar, Sushrut, Klivans, Adam

Neural Information Processing Systems

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss.