Goto

Collaborating Authors

 information-theoretic limit


Information-theoretic Limits of Online Classification with Noisy Labels

Neural Information Processing Systems

We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings.


Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Neural Information Processing Systems

Prior works showed that gradient-based training of neural networks can learn this target with n\gtrsim d {\Theta(p)} samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns f_* with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and runtime complexity of n \simeq T \Theta(d\cdot\mathrm{polylog} d), where \Theta(\cdot) hides a constant only depending on the degree of \sigma_*; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. More generally, we show that n\gtrsim d {(p_*-1)\vee 1} samples are sufficient to achieve low generalization error, where p_* \le p is the \textit{generative exponent} of the link function. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.


Reviews: Information-theoretic Limits for Community Detection in Network Models

Neural Information Processing Systems

This article proves information theoretic lower bounds for the community detection problem in a range of network models. For the SBM, this has been previously achieved in works of Mossel-Neeman-Sly, Coja-Oghlan, Abbe-Sandon etc. The authors of the current paper plant community structure (two randomly assigned communities) in various other models such as latent space models, preferential attachment models, small-world models etc. and prove similar information theoretic lower bounds. The proofs employ (not surprisingly) Fano's inequality in carefully restricted submodels. However, the authors only prove lower bounds, it is not clear if these are tight (it is for the SBM), and if so what (preferably polynomial time) algorithms can achieve them. There are a host of other interesting questions to ask.


Output-Constrained Lossy Source Coding With Application to Rate-Distortion-Perception Theory

Xie, Li, Li, Liangyan, Chen, Jun, Zhang, Zhongshan

arXiv.org Artificial Intelligence

The distortion-rate function of output-constrained lossy source coding with limited common randomness is analyzed for the special case of squared error distortion measure. An explicit expression is obtained when both source and reconstruction distributions are Gaussian. This further leads to a partial characterization of the information-theoretic limit of quadratic Gaussian rate-distortion-perception coding with the perception measure given by Kullback-Leibler divergence or squared quadratic Wasserstein distance.


Non-Convex Exact Community Recovery in Stochastic Block Model

Wang, Peng, Zhou, Zirui, So, Anthony Man-Cho

arXiv.org Machine Learning

Learning community structures in graphs that are randomly generated by stochastic block models (SBMs) has received much attention lately. In this paper, we focus on the problem of exactly recovering the communities in a binary symmetric SBM, where a graph of $n$ vertices is partitioned into two equal-sized communities and the vertices are connected with probability $p = \alpha\log(n)/n$ within communities and $q = \beta\log(n)/n$ across communities for some $\alpha>\beta>0$. We propose a two-stage iterative algorithm for solving this problem, which employs the power method with a random starting point in the first stage and turns to a generalized power method that can identify the communities in a finite number of iterations in the second stage. It is shown that for any fixed $\alpha$ and $\beta$ such that $\sqrt{\alpha} - \sqrt{\beta} > \sqrt{2}$, which is known to be the information-theoretic limit for exact recovery, the proposed algorithm exactly identifies the underlying communities in $\tilde{O}(n)$ running time with probability tending to one as $n\rightarrow\infty$. We also present numerical results of the proposed algorithm to support and complement our theoretical development.


Reducibility and Statistical-Computational Gaps from Secret Leakage

Brennan, Matthew, Bresler, Guy

arXiv.org Machine Learning

Inference problems with conjectured statistical-computational gaps are ubiquitous throughout modern statistics, computer science and statistical physics. While there has been success evidencing these gaps from the failure of restricted classes of algorithms, progress towards a more traditional reduction-based approach to computational complexity in statistical inference has been limited. Existing reductions have largely been limited to inference problems with similar structure -- primarily mapping among problems representable as a sparse submatrix signal plus a noise matrix, which are similar to the common hardness assumption of planted clique. The insight in this work is that a slight generalization of the planted clique conjecture -- secret leakage planted clique -- gives rise to a variety of new average-case reduction techniques, yielding a web of reductions among problems with very different structure. Using variants of the planted clique conjecture for specific forms of secret leakage planted clique, we deduce tight statistical-computational tradeoffs for a diverse range of problems including robust sparse mean estimation, mixtures of sparse linear regressions, robust sparse linear regression, tensor PCA, variants of dense $k$-block stochastic block models, negatively correlated sparse PCA, semirandom planted dense subgraph, detection in hidden partition models and a universality principle for learning sparse mixtures. In particular, a $k$-partite hypergraph variant of the planted clique conjecture is sufficient to establish all of our computational lower bounds. Our techniques also reveal novel connections to combinatorial designs and to random matrix theory. This work gives the first evidence that an expanded set of hardness assumptions, such as for secret leakage planted clique, may be a key first step towards a more complete theory of reductions among statistical problems.


Information-theoretic Limits for Community Detection in Network Models

Ke, Chuyang, Honorio, Jean

Neural Information Processing Systems

We analyze the information-theoretic limits for the recovery of node labels in several network models. This includes the Stochastic Block Model, the Exponential Random Graph Model, the Latent Space Model, the Directed Preferential Attachment Model, and the Directed Small-world Model. For the Stochastic Block Model, the non-recoverability condition depends on the probabilities of having edges inside a community, and between different communities. For the Latent Space Model, the non-recoverability condition depends on the dimension of the latent space, and how far and spread are the communities in the latent space. For the Directed Preferential Attachment Model and the Directed Small-world Model, the non-recoverability condition depends on the ratio between homophily and neighborhood size.


Crowdsourced Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

Kim, Daesung, Chung, Hye Won

arXiv.org Machine Learning

Crowdsourcing systems have emerged as an effective platform to label data and classify objects with relatively low cost by exploiting non-expert workers. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks "group attribute" of a chosen subset of objects. In particular, we consider the problem of classifying $m$ binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size $d$ is even or odd. The subset size $d$, which we call query degree, can be varying over queries. Since a worker needs to make more efforts to answer a query of a higher degree, we consider a noise model where the accuracy of worker's answer changes depending both on the worker reliability and query degree $d$. For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover $m$ labels in terms of a given combination of degree-$d$ queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.


Mutual Information in Community Detection with Covariate Information and Correlated Networks

Mayya, Vaishakhi, Reeves, Galen

arXiv.org Machine Learning

We study the problem of community detection when there is covariate information about the node labels and one observes multiple correlated networks. We provide an asymptotic upper bound on the per-node mutual information as well as a heuristic analysis of a multivariate performance measure called the MMSE matrix. These results show that the combined effects of seemingly very different types of information can be characterized explicitly in terms of formulas involving low-dimensional estimation problems in additive Gaussian noise. Our analysis is supported by numerical simulations.


Information-theoretic Limits for Community Detection in Network Models

Ke, Chuyang, Honorio, Jean

arXiv.org Machine Learning

We analyze the information-theoretic limits for the recovery of node labels in several network models, including the stochastic block model, as well as the latent space model. For the stochastic block model, the non-recoverability condition depends on the probabilities of having edges inside a community, and between different communities. For the latent space model, the non-recoverability condition depends on the dimension of the latent space, and how far and spread are the communities in the latent space. We also extend our analysis to dynamic models in which edges not only depend on their endpoints, but also on previously generated edges.