Goto

Collaborating Authors

 pac learnable




On statistical learning of graphs

arXiv.org Artificial Intelligence

We study PAC and online learnability of hypothesis classes formed by copies of a countably infinite graph G, where each copy is induced by permuting G's vertices. This corresponds to learning a graph's labeling, knowing its structure and label set. We consider classes where permutations move only finitely many vertices. Our main result shows that PAC learnability of all such finite-support copies implies online learnability of the full isomorphism type of G, and is equivalent to the condition of automorphic triviality. We also characterize graphs where copies induced by swapping two vertices are not learnable, using a relaxation of the extension property of the infinite random graph. Finally, we show that, for all G and k>2, learnability for k-vertex permutations is equivalent to that for 2-vertex permutations, yielding a four-class partition of infinite graphs, whose complexity we also determine using tools coming from both descriptive set theory and computability theory.


Logical perspectives on learning statistical objects

arXiv.org Artificial Intelligence

We consider the relationship between learnability of a ``base class'' of functions on a set X and learnability of a class of statistical functions derived from the base class. For example, we refine results showing that learnability of a family of functions implies learnability of the family of functions mapping a function in the class to its expectation under a distribution. We will look at both Probably Approximately Correct (PAC) learning, where example inputs and outputs are chosen at random, and online learning, where the examples are chosen adversarially. We establish improved bounds on the sample complexity of learning for statistical classes, stated in terms of combinatorial dimensions of the base class. We do this by adapting techniques introduced in model theory for ``randomizing a structure''. We give particular attention to classes derived from logical formulas, and relate learnability of the statistical classes to properties of the formula. Finally, we provide bounds on the complexity of learning the statistical classes built on top of a logic-based hypothesis class.


Learning Shuffle Ideals Under Restricted Distributions

Neural Information Processing Systems

The class of shuffle ideals is a fundamental sub-family of regular languages. The shuffle ideal generated by a string set U is the collection of all strings containing some string u U as a (not necessarily contiguous) subsequence. In spite of its apparent simplicity, the problem of learning a shuffle ideal from given data is known to be computationally intractable. In this paper, we study the PAC learnability of shuffle ideals and present positive results on this learning problem under element-wise independent and identical distributions and Markovian distributions in the statistical query model. A constrained generalization to learning shuffle ideals under product distributions is also provided. In the empirical direction, we propose a heuristic algorithm for learning shuffle ideals from given labeled strings under general unrestricted distributions. Experiments demonstrate the advantage for both efficiency and accuracy of our algorithm.


Review for NeurIPS paper: Reducing Adversarially Robust Learning to Non-Robust PAC Learning

Neural Information Processing Systems

Summary and Contributions: Robust learning has got a lot of attention over the last decade. This paper is about *test-time attacks (called evasion attacks or attacks finding "adversarial examples"). Such attacks are studied widely from experimental point of view, but more recently theoretical results are being proven for understanding how, and when, learning under such adversarial perturbations are possible. Montasser et al (MHS COLT 19) showed that if a problem is PAC learnable, i.e., has finite VC dimension d, then it is also robustly PAC learnable, though the sample complexity (in their proof) could be as large as exponential(d). This paper's main contribution is an alternative proof of the result of MHS, by showing how to reduce the task of robust learning to the task of normal PAC learning.


Review for NeurIPS paper: Reducing Adversarially Robust Learning to Non-Robust PAC Learning

Neural Information Processing Systems

All the reviewers agreed that the paper provides a novel and important theoretical result. Specifically, one of the main contributions of the paper is to show that if a problem is PAC learnable, i.e., has finite VC dimension d, then it i also robustly PAC learnable. The result had already been proven last year, however, this paper provides a novel framework to prove it by showing how to reduce the task of robust learning to the task of normal PAC learning (it should be noted that the reduction may not be efficient (i.e., polynomial time) but still is important from the theoretical and statistical perspective. The reviewers also had some suggestions for the revised version of the paper which are reflected in the'r updated reviews.


Measurability in the Fundamental Theorem of Statistical Learning

arXiv.org Machine Learning

The Fundamental Theorem of Statistical Learning states that a hypothesis space is PAC learnable if and only if its VC dimension is finite. For the agnostic model of PAC learning, the literature so far presents proofs of this theorem that often tacitly impose several measurability assumptions on the involved sets and functions. We scrutinize these proofs from a measure-theoretic perspective in order to extract the assumptions needed for a rigorous argument. This leads to a sound statement as well as a detailed and self-contained proof of the Fundamental Theorem of Statistical Learning in the agnostic setting, showcasing the minimal measurability requirements needed. We then discuss applications in Model Theory, considering NIP and o-minimal structures. Our main theorem presents sufficient conditions for the PAC learnability of hypothesis spaces defined over o-minimal expansions of the reals.


List Sample Compression and Uniform Convergence

arXiv.org Machine Learning

List learning is a variant of supervised classification where the learner outputs multiple plausible labels for each instance rather than just one. We investigate classical principles related to generalization within the context of list learning. Our primary goal is to determine whether classical principles in the PAC setting retain their applicability in the domain of list PAC learning. We focus on uniform convergence (which is the basis of Empirical Risk Minimization) and on sample compression (which is a powerful manifestation of Occam's Razor). In classical PAC learning, both uniform convergence and sample compression satisfy a form of `completeness': whenever a class is learnable, it can also be learned by a learning rule that adheres to these principles. We ask whether the same completeness holds true in the list learning setting. We show that uniform convergence remains equivalent to learnability in the list PAC learning setting. In contrast, our findings reveal surprising results regarding sample compression: we prove that when the label space is $Y=\{0,1,2\}$, then there are 2-list-learnable classes that cannot be compressed. This refutes the list version of the sample compression conjecture by Littlestone and Warmuth (1986). We prove an even stronger impossibility result, showing that there are $2$-list-learnable classes that cannot be compressed even when the reconstructed function can work with lists of arbitrarily large size. We prove a similar result for (1-list) PAC learnable classes when the label space is unbounded. This generalizes a recent result by arXiv:2308.06424.


Learning Shuffle Ideals Under Restricted Distributions

Neural Information Processing Systems

The class of shuffle ideals is a fundamental sub-family of regular languages. The shuffle ideal generated by a string set U is the collection of all strings containing some string u U as a (not necessarily contiguous) subsequence. In spite of its apparent simplicity, the problem of learning a shuffle ideal from given data is known to be computationally intractable. In this paper, we study the PAC learnability of shuffle ideals and present positive results on this learning problem under element-wise independent and identical distributions and Markovian distributions in the statistical query model. A constrained generalization to learning shuffle ideals under product distributions is also provided. In the empirical direction, we propose a heuristic algorithm for learning shuffle ideals from given labeled strings under general unrestricted distributions. Experiments demonstrate the advantage for both efficiency and accuracy of our algorithm.