Cherapanamjeri, Yeshwanth
Optimal Mean Estimation without a Variance
Cherapanamjeri, Yeshwanth, Tripuraneni, Nilesh, Bartlett, Peter L., Jordan, Michael I.
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. Concretely, given a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$ over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following \emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*} \forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu, v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target failure probability, $\delta$, our goal is to design an estimator which attains the smallest possible confidence interval as a function of $n,d,\delta$. For the specific case of $\alpha = 1$, foundational work of Lugosi and Mendelson exhibits an estimator achieving subgaussian confidence intervals, and subsequent work has led to computationally efficient versions of this estimator. Here, we study the case of general $\alpha$, and establish the following information-theoretic lower bound on the optimal attainable confidence interval: \begin{equation*} \Omega \left(\sqrt{\frac{d}{n}} + \left(\frac{d}{n}\right)^{\frac{\alpha}{(1 + \alpha)}} + \left(\frac{\log 1 / \delta}{n}\right)^{\frac{\alpha}{(1 + \alpha)}}\right). \end{equation*} Moreover, we devise a computationally-efficient estimator which achieves this lower bound.
Optimal Robust Linear Regression in Nearly Linear Time
Cherapanamjeri, Yeshwanth, Aras, Efe, Tripuraneni, Nilesh, Jordan, Michael I., Flammarion, Nicolas, Bartlett, Peter L.
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X,w^* \rangle + \epsilon$ (with $X \in \mathbb{R}^d$ and $\epsilon$ independent), in which an $\eta$ fraction of the samples have been adversarially corrupted. We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $\mathbb{E} [XX^\top]$ has bounded condition number and $\epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $\epsilon$ is sub-Gaussian. In both settings, our estimators: (a) Achieve optimal sample complexities and recovery guarantees up to log factors and (b) Run in near linear time ($\tilde{O}(nd / \eta^6)$). Prior to our work, polynomial time algorithms achieving near optimal sample complexities were only known in the setting where $X$ is Gaussian with identity covariance and $\epsilon$ is Gaussian, and no linear time estimators were known for robust linear regression in any setting. Our estimators and their analysis leverage recent developments in the construction of faster algorithms for robust mean estimation to improve runtimes, and refined concentration of measure arguments alongside Gaussian rounding techniques to improve statistical sample complexities.
Testing Markov Chains without Hitting
Cherapanamjeri, Yeshwanth, Bartlett, Peter L.
We study the problem of identity testing of markov chains. In this setting, we are given access to a single trajectory from a markov chain with unknown transition matrix $Q$ and the goal is to determine whether $Q = P$ for some known matrix $P$ or $\text{Dist}(P, Q) \geq \epsilon$ where $\text{Dist}$ is suitably defined. In recent work by Daskalakis, Dikkala and Gravin, 2018, it was shown that it is possible to distinguish between the two cases provided the length of the observed trajectory is at least super-linear in the hitting time of $P$ which may be arbitrarily large. In this paper, we propose an algorithm that avoids this dependence on hitting time thus enabling efficient testing of markov chains even in cases where it is infeasible to observe every state in the chain. Our algorithm is based on combining classical ideas from approximation algorithms with techniques for the spectral analysis of markov chains.
Fast Mean Estimation with Sub-Gaussian Rates
Cherapanamjeri, Yeshwanth, Flammarion, Nicolas, Bartlett, Peter L.
We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i.i.d.~samples and that has error bounds matching the sub-Gaussian case. The only assumptions we make about the data distribution are that it has finite mean and covariance; in particular, we make no assumptions about higher-order moments. Like the polynomial time estimator introduced by Hopkins, 2018, which is based on the sum-of-squares hierarchy, our estimator achieves optimal statistical efficiency in this challenging setting, but it has a significantly faster runtime and a simpler analysis.