Goto

Collaborating Authors

 haussler




Higher-arity PAC learning, VC dimension and packing lemma

Chernikov, Artem, Towsner, Henry

arXiv.org Machine Learning

The aim of this note is to overview some of our work in Chernikov, Towsner'20 (arXiv:2010.00726) developing higher arity VC theory (VC$_n$ dimension), including a generalization of Haussler packing lemma, and an associated tame (slice-wise) hypergraph regularity lemma; and to demonstrate that it characterizes higher arity PAC learning (PAC$_n$ learning) in $n$-fold product spaces with respect to product measures introduced by Kobayashi, Kuriyama and Takeuchi'15. We also point out how some of the recent results in arXiv:2402.14294, arXiv:2505.15688, arXiv:2509.20404 follow from our work in arXiv:2010.00726.


Sample completion, structured correlation, and Netflix problems

Coregliano, Leonardo N., Malliaris, Maryanthe

arXiv.org Machine Learning

We develop a new high-dimensional statistical learning model which can take advantage of structured correlation in data even in the presence of randomness. We completely characterize learnability in this model in terms of VCN${}_{k,k}$-dimension (essentially $k$-dependence from Shelah's classification theory). This model suggests a theoretical explanation for the success of certain algorithms in the 2006~Netflix Prize competition.



A packing lemma for VCN${}_k$-dimension and learning high-dimensional data

Coregliano, Leonardo N., Malliaris, Maryanthe

arXiv.org Artificial Intelligence

Recently, the authors introduced the theory of high-arity PAC learning, which is well-suited for learning graphs, hypergraphs and relational structures. In the same initial work, the authors proved a high-arity analogue of the Fundamental Theorem of Statistical Learning that almost completely characterizes all notions of high-arity PAC learning in terms of a combinatorial dimension, called the Vapnik--Chervonenkis--Natarajan (VCN${}_k$) $k$-dimension, leaving as an open problem only the characterization of non-partite, non-agnostic high-arity PAC learnability. In this work, we complete this characterization by proving that non-partite non-agnostic high-arity PAC learnability implies a high-arity version of the Haussler packing property, which in turn implies finiteness of VCN${}_k$-dimension. This is done by obtaining direct proofs that classic PAC learnability implies classic Haussler packing property, which in turn implies finite Natarajan dimension and noticing that these direct proofs nicely lift to high-arity.


Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

Bartlett, Peter L., Long, Philip M.

arXiv.org Artificial Intelligence

We present a new general-purpose algorithm for learning classes of [0,1]-valued functions in a generalization of the prediction model, and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive generalization of the Vapnik dimension proposed by Alon, Ben-David, Cesa-Bianchi and Haussler. We give lower bounds implying that our upper bounds cannot be improved by more than a constant factor in general. We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension. Using a different technique, we obtain new bounds on packing numbers in terms of Kearns and Schapire's fat-shattering function. We show how to apply both packing bounds to obtain improved general bounds on the sample complexity of agnostic learning. For each ǫ > 0, we establish weaker sufficient and stronger necessary conditions for a class of [0,1]-valued functions to be agnostically learnable to within ǫ, and to be an ǫ-uniform Glivenko-Cantelli class. This is a manuscript that was accepted by JCSS, together with a correction. Also affiliated with University of California, Berkeley.


Optimal PAC Bounds Without Uniform Convergence

Aden-Ali, Ishaq, Cherapanamjeri, Yeshwanth, Shetty, Abhishek, Zhivotovskiy, Nikita

arXiv.org Artificial Intelligence

In statistical learning theory, determining the sample complexity of realizable binary classification for VC classes was a long-standing open problem. The results of Simon and Hanneke established sharp upper bounds in this setting. However, the reliance of their argument on the uniform convergence principle limits its applicability to more general learning settings such as multiclass classification. In this paper, we address this issue by providing optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments. Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds. As an application, by adapting the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we propose an algorithm that achieves an optimal PAC bound for binary classification. Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal, addressing a variant of a classic question posed by Warmuth. We further instantiate our framework in three settings where uniform convergence is provably suboptimal. For multiclass classification, we prove an optimal risk bound that scales with the one-inclusion hypergraph density of the class, addressing the suboptimality of the analysis of Daniely and Shalev-Shwartz. For partial hypothesis classification, we determine the optimal sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman, and Moran. For realizable bounded regression with absolute loss, we derive an optimal risk bound that relies on a modified version of the scale-sensitive dimension, refining the results of Bartlett and Long. Our rates surpass standard uniform convergence-based results due to the smaller complexity measure in our risk bound.


The One-Inclusion Graph Algorithm is not Always Optimal

Aden-Ali, Ishaq, Cherapanamjeri, Yeshwanth, Shetty, Abhishek, Zhivotovskiy, Nikita

arXiv.org Artificial Intelligence

The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest sense: for any practically interesting Vapnik-Chervonenkis class, we provide an in-expectation optimal one-inclusion graph algorithm whose high probability risk bound cannot go beyond that implied by Markov's inequality. Our construction of these poorly performing one-inclusion graph algorithms uses Varshamov-Tenengolts error correcting codes. Our negative result has several implications. First, it shows that the same poor high-probability performance is inherited by several recent prediction strategies based on generalizations of the one-inclusion graph algorithm. Second, our analysis shows yet another statistical problem that enjoys an estimator that is provably optimal in expectation via a leave-one-out argument, but fails in the high-probability regime. This discrepancy occurs despite the boundedness of the binary loss for which arguments based on concentration inequalities often provide sharp high probability risk bounds.


Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization

Montasser, Omar, Hanneke, Steve, Srebro, Nathan

arXiv.org Artificial Intelligence

We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994). Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. (2019), and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.