good batch
A General Method for Robust Learning from Batches
In many applications, data is collected in batches, some of which are corrupt or even adversarial. Recent work derived optimal robust algorithms for estimating discrete distributions in this setting. We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. Building on these results, we derive the first robust agnostic computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
Robust Learning of Discrete Distributions from Batches
Let $d$ be the lowest $L_1$ distance to which a $k$-symbol distribution $p$ can be estimated from $m$ batches of $n$ samples each, when up to $\beta m$ batches may be adversarial. For $\beta<1/2$, Qiao and Valiant (2017) showed that $d=\Omega(\beta/\sqrt{n})$ and requires $m=\Omega(k/\beta^2)$ batches. For $\beta<1/900$, they provided a $d$ and $m$ order-optimal algorithm that runs in time exponential in $k$. For $\beta<0.5$, we propose an algorithm with comparably optimal $d$ and $m$, but run-time polynomial in $k$ and all other parameters.