Goto

Collaborating Authors

 definition 4


High-dimensional estimation with missing data: Statistical and computational limits

Verchand, Kabir Aladin, Pensia, Ankit, Haque, Saminul, Kuditipudi, Rohith

arXiv.org Machine Learning

We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $ε$ fraction of the observations are subject to an arbitrary (and unknown) missing not at random (MNAR) mechanism. When the true data is Gaussian, we provide evidence towards statistical-computational gaps in several problems. For mean estimation in $\ell_2$ norm, we show that in order to obtain error at most $ρ$, for any constant contamination $ε\in (0, 1)$, (roughly) $n \gtrsim d e^{1/ρ^2}$ samples are necessary and that there is a computationally-inefficient algorithm which achieves this error. On the other hand, we show that any computationally-efficient method within certain popular families of algorithms requires a much larger sample complexity of (roughly) $n \gtrsim d^{1/ρ^2}$ and that there exists a polynomial time algorithm based on sum-of-squares which (nearly) achieves this lower bound. For covariance estimation in relative operator norm, we show that a parallel development holds. Finally, we turn to linear regression with missing observations and show that such a gap does not persist. Indeed, in this setting we show that minimizing a simple, strongly convex empirical risk nearly achieves the information-theoretic lower bound in polynomial time.


Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions

Tran-Dinh, Quoc, Nguyen-Trung, Nghia

arXiv.org Machine Learning

This paper develops new variance-reduction techniques for the forward-reflected-backward splitting (FRBS) method to solve a class of possibly nonmonotone stochastic composite inclusions. Unlike unbiased estimators such as mini-batching, developing stochastic biased variants faces a fundamental technical challenge and has not been utilized before for inclusions and fixed-point problems. We fill this gap by designing a new framework that can handle both unbiased and biased estimators. Our main idea is to construct stochastic variance-reduced estimators for the forward-reflected direction and use them to perform iterate updates. First, we propose a class of unbiased variance-reduced estimators and show that increasing mini-batch SGD, loopless-SVRG, and SAGA estimators fall within this class. For these unbiased estimators, we establish a $\mathcal{O}(1/k)$ best-iterate convergence rate for the expected squared residual norm, together with almost-sure convergence of the iterate sequence to a solution. Consequently, we prove that the best oracle complexities for the $n$-finite-sum and expectation settings are $\mathcal{O}(n^{2/3}ε^{-2})$ and $\mathcal{O}(ε^{-10/3})$, respectively, when employing loopless-SVRG or SAGA, where $ε$ is a desired accuracy. Second, we introduce a new class of biased variance-reduced estimators for the forward-reflected direction, which includes SARAH, Hybrid SGD, and Hybrid SVRG as special instances. While the convergence rates remain valid for these biased estimators, the resulting oracle complexities are $\mathcal{O}(n^{3/4}ε^{-2})$ and $\mathcal{O}(ε^{-5})$ for the $n$-finite-sum and expectation settings, respectively. Finally, we conduct two numerical experiments on AUC optimization for imbalanced classification and policy evaluation in reinforcement learning.



Geometry-Aware Adaptation for Pretrained Models

Neural Information Processing Systems

Machine learning models--including prominent zero-shot models--are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them.






Sample Complexity of Forecast Aggregation

Neural Information Processing Systems

We consider a Bayesian forecast aggregation model where n experts, after observing private signals about an unknown binary event, report th eir posterior beliefs about the event to a principal, who then aggregates the repor ts into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but th e principal has access to i.i.d. "samples" from the distribution, where each sampl e is a tuple of the experts' reports (not signals) and the realization of the even t. Using these samples, the principal aims to find an ε -approximately optimal aggregator, where optimal-ity is measured in terms of the expected squared distance bet ween the aggregated prediction and the realization of the event.