hanneke
Adversarial Resilience in Sequential Prediction via Abstention Anonymous Author(s) Affiliation Address email
We study the problem of sequential prediction in the stochastic setting with an1 adversary that is allowed to inject clean-label adversarial (or out-of-distribution)2 examples. Algorithms designed to handle purely stochastic data tend to fail in the3 presence of such adversarial examples, often leading to erroneous predictions. This4 is undesirable in many high-stakes applications such as medical recommendations,5 where abstaining from predictions on adversarial examples is preferable to mis-6 classification. On the other hand, assuming fully adversarial data leads to very7 pessimistic bounds that are often vacuous in practice.8 To capture this motivation, we propose a new model of sequential prediction that9 sits between the purely stochastic and fully adversarial settings by allowing the10 learner to abstain from making a prediction at no cost on adversarial examples.11
A Theory of Optimistically Universal Online Learnability for General Concept Classes
We provide a full characterization of the concept classes that are optimistically universally online learnable with {0, 1} labels. The notion of optimistically universal online learning was defined in [Hanneke, 2021] in order to understand learnability under minimal assumptions. In this paper, following the philosophy behind that work, we investigate two questions, namely, for every concept class: (1) What are the minimal assumptions on the data process admitting online learnability?
Universal Rates for Active Learning
In this work we study the problem of actively learning binary classifiers from a given concept class, i.e., learning by utilizing unlabeled data and submitting targeted queries about their labels to a domain expert. We evaluate the quality of our solutions by considering the learning curves they induce, i.e., the rate of
A Theory of Universal Agnostic Learning
We provide a complete theory of optimal universal rates for binary classification in the agnostic setting. This extends the realizable-case theory of Bousquet, Hanneke, Moran, van Handel, and Yehudayoff (2021) by removing the realizability assumption on the distribution. We identify a fundamental tetrachotomy of optimal rates: for every concept class, the optimal universal rate of convergence of the excess error rate is one of $e^{-n}$, $e^{-o(n)}$, $o(n^{-1/2})$, or arbitrarily slow. We further identify simple combinatorial structures which determine which of these categories any given concept class falls into.
Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization
We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro [2019] and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth [1994]. Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. [2019], and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.
Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
In this paper we study the problem of multiclass classification with a bounded number of different labels $k$, in the realizable setting. We extend the traditional PAC model to a) distribution-dependent learning rates, and b) learning rates under data-dependent assumptions. First, we consider the universal learning setting (Bousquet, Hanneke, Moran, van Handel and Yehudayoff, STOC'21), for which we provide a complete characterization of the achievable learning rates that holds for every fixed distribution. In particular, we show the following trichotomy: for any concept class, the optimal learning rate is either exponential, linear or arbitrarily slow. Additionally, we provide complexity measures of the underlying hypothesis class that characterize when these rates occur. Second, we consider the problem of multiclass classification with structured data (such as data lying on a low dimensional manifold or satisfying margin conditions), a setting which is captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS'21). Partial concepts are functions that can be undefined in certain parts of the input space. We extend the traditional PAC learnability of total concept classes to partial concept classes in the multiclass setting and investigate differences between partial and total concepts.