statistical-computational tradeoff
Statistical-Computational Tradeoff in Single Index Models
We study the statistical-computational tradeoffs in a high dimensional single index model $Y=f(X^\top\beta^*) +\epsilon$, where $f$ is unknown, $X$ is a Gaussian vector and $\beta^*$ is $s$-sparse with unit norm. When $\cov(Y,X^\top\beta^*)\neq 0$, \cite{plan2016generalized} shows that the direction and support of $\beta^*$ can be recovered using a generalized version of Lasso. In this paper, we investigate the case when this critical assumption fails to hold, where the problem becomes considerably harder. Using the statistical query model to characterize the computational cost of an algorithm, we show that when $\cov(Y,X^\top\beta^*)=0$ and $\cov(Y,(X^\top\beta^*)^2)> 0$, no computationally tractable algorithms can achieve the information-theoretic limit of the minimax risk. This implies that one must pay an extra computational cost for the nonlinearity involved in the model.
More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning
We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1-\alpha$. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the statistical accuracies and computational efficiency of these algorithms depend on the degree of supervision, which is quantified by $\alpha$. In this paper, we characterize the effect of $\alpha$ by establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small $\alpha$, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as $\alpha$ increases. In other words, having more supervision, i.e., more correct labels, not only improves the optimal statistical accuracy as expected, but also enhances the computational efficiency for achieving such accuracy.
Reviews: Statistical-Computational Tradeoff in Single Index Models
The paper first introduces first-order and second-order Stein's identity and then defines two function sets, C1 and C2, characterized by the covariance between f and X T\beta *. Further, authors define a common function set C(psi), which includes all link functions such that the second-order Stein's identity does not vanish under transformation psi. Then, authors propose a mixed model in 2.6 using two link functions f1\in C1\cap C(\psi) and f2\in C2\cap C(\psi). This model is finally used to derive lower bound. This is reasonable since true beta with link function f1 is easy to estimate (using first-order Stein's identity), while true beta with f2 is indistinguishable. The minimax rate is established in Prop 3.1.
Reviews: More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning
This paper is interesting and deals with new kind of results introducing computational aspects in standard minimax theory. The phenomenon illustrated is new to me, and present some limitation of the computationally tractable algorithm w.r.t. "theoretical" ones that could be considered in the classical minimax theory. However, due to the relative novelty of the framework, it would be important that basic definitions and properties be better presented. In the following there is only one model investigated.
Statistical-Computational Tradeoff in Single Index Models
Wang, Lingxiao, Yang, Zhuoran, Wang, Zhaoran
We study the statistical-computational tradeoffs in a high dimensional single index model $Y f(X \top\beta *) \epsilon$, where $f$ is unknown, $X$ is a Gaussian vector and $\beta *$ is $s$-sparse with unit norm. When $\cov(Y,X \top\beta *) eq 0$, \cite{plan2016generalized} shows that the direction and support of $\beta *$ can be recovered using a generalized version of Lasso. In this paper, we investigate the case when this critical assumption fails to hold, where the problem becomes considerably harder. Using the statistical query model to characterize the computational cost of an algorithm, we show that when $\cov(Y,X \top\beta *) 0$ and $\cov(Y,(X \top\beta *) 2) 0$, no computationally tractable algorithms can achieve the information-theoretic limit of the minimax risk. This implies that one must pay an extra computational cost for the nonlinearity involved in the model.
More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning
Yi, Xinyang, Wang, Zhaoran, Yang, Zhuoran, Caramanis, Constantine, Liu, Han
We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1-\alpha$. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the statistical accuracies and computational efficiency of these algorithms depend on the degree of supervision, which is quantified by $\alpha$. In this paper, we characterize the effect of $\alpha$ by establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small $\alpha$, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as $\alpha$ increases.