gaussian marginal
Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt < 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time.
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(\bx, y)$ from an unknown distribution on $\R^d \times \{ \pm 1\}$, whose marginal distribution on $\bx$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\opt+\eps$, where $\opt$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(\bx, y)$ from an unknown distribution on $\R^d \times \R$, whose marginal distribution on $\bx$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $\opt+\eps$, where $\opt$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{\poly(1/\eps)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
Proximal Approximate Inference in State-Space Models
Abdulsamad, Hany, García-Fernández, Ángel F., Särkkä, Simo
We present a class of algorithms for state estimation in nonlinear, non-Gaussian state-space models. Our approach is based on a variational Lagrangian formulation that casts Bayesian inference as a sequence of entropic trust-region updates subject to dynamic constraints. This framework gives rise to a family of forward-backward algorithms, whose structure is determined by the chosen factorization of the variational posterior. By focusing on Gauss--Markov approximations, we derive recursive schemes with favorable computational complexity. For general nonlinear, non-Gaussian models we close the recursions using generalized statistical linear regression and Fourier--Hermite moment matching.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Overview (0.45)
- Research Report (0.40)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
Reliable Learning of Halfspaces under Gaussian Marginals
We study the problem of PAC learning halfspaces in the reliable agnostic model of Kalai et al. (2012).The reliable PAC model captures learning scenarios where one type of error is costlier than the others. We complement our upper bound with a Statistical Query lower bound suggesting that the d {\Omega(\log (1/\alpha))} dependence is best possible. Conceptually, our results imply a strong computational separation between reliable agnostic learning and standard agnostic learning of halfspaces in the Gaussian setting.
Reviews: Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
This paper studies the computational complexity of learning a single ReLU with respect to Gaussian examples. Since ReLUs are now the standard choice of nonlinearity in deep neural networks, the computational complexity of learning them is clearly of interest. Of course, the computational complexity of learning a ReLU may depend substantially on the specific setting assumed; it is interesting to understand the range of such assumptions and their implications for complexity. This paper studies the following setting: given independent samples (x_1,y_1), ..., (x_n,y_n) where x is spherical Gaussian in d dimensions and y \in R is arbitrary, find a ReLU function f_w(x) max(0, w \cdot x) for some vector w with minimal mean squared error \sum (y_i - f(x_i)) 2. (This is agnostic learning since the y's are arbitrary.) The main results are as follows: 1) There is no algorithm to learn a single ReLU with respect to Gaussian examples to additive error \epsilon in time d {o(log 1/\epsilon)} unless k -sparse parities with noise can be learned in time d {o(k)} 2) If opt min_{w} (mean squared error of f) then (with normalization such that opt \in [0,1]) there is an algorithm which agnostically learns a ReLU to error opt {2/3} \epsilon in time poly(d,1/\epsilon). The proof of (1) goes via Hermite analysis (i.e.
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples (\bx, y) from an unknown distribution on \R d \times \{ \pm 1\}, whose marginal distribution on \bx is the standard Gaussian and the labels y can be arbitrary, the goal is to output a hypothesis with 0-1 loss \opt \eps, where \opt is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples (\bx, y) from an unknown distribution on \R d \times \R, whose marginal distribution on \bx is the standard Gaussian and the labels y can be arbitrary, the goal is to output a hypothesis with square loss \opt \eps, where \opt is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of d {\poly(1/\eps)} for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let \opt 1 be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss \opt \epsilon is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss.
Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals
Diakonikolas, Ilias, Kane, Daniel M., Ren, Lisheng
We study the task of agnostically learning halfspaces under the Gaussian distribution. Specifically, given labeled examples $(\mathbf{x},y)$ from an unknown distribution on $\mathbb{R}^n \times \{ \pm 1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. We prove a near-optimal computational hardness result for this task, under the widely believed sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to yield near-optimal lower bounds for related problems, including ReLU regression.
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
Goel, Surbhi, Karmalkar, Sushrut, Klivans, Adam
We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss.