AITopics | Computational Learning Theory

Collaborating Authors

Computational Learning Theory

In computer science, computational learning theory (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Towards a Unified Information-Theoretic Framework for Generalization

Neural Information Processing SystemsFeb-10-2025, 23:35:37 GMT

In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou [1] and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We then explore two directions of strengthening this bound: (i) Can the CMI framework express optimal bounds for VC classes?

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

Neural Information Processing SystemsFeb-10-2025, 22:11:53 GMT

We examine the relationship between the mutual information between the output model and the empirical sample and the generalization of the algorithm in the context of stochastic convex optimization. Despite increasing interest in informationtheoretic generalization bounds, it is uncertain if these bounds can provide insight into the exceptional performance of various learning algorithms. Our study of stochastic convex optimization reveals that, for true risk minimization, dimensiondependent mutual information is necessary. This indicates that existing informationtheoretic generalization bounds fall short in capturing the generalization capabilities of algorithms like SGD and regularized ERM, which have dimension-independent sample complexity.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

Neural Information Processing SystemsFeb-10-2025, 22:11:50 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization

Neural Information Processing SystemsFeb-10-2025, 20:35:20 GMT

We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by [21] and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to [17]. Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors H are robustly learnable. This resolves an open problem due to [21], and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.

artificial intelligence, learner, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
North America > Canada > British Columbia (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Add feedback

Table of Contents of Appendix A Detailed Related Work 17 B Limitations and Potential Negative Societal Impacts 18 C Discussions and Details about Experiments in Figure 1 18 C.1 Summary

Neural Information Processing SystemsFeb-10-2025, 20:16:09 GMT

Distance-based methods are based on the assumption that OOD data should be relatively far away from the centroids of ID classes [9], including Mahalanobis distance [9, 45], cosine similarity [74], and kernel similarity [75]. Early works consider using the maximum softmax score to express the ID-ness [7]. Then, temperature scaling functions are used to amplify the separation between the ID and OOD data [14]. Recently, researchers propose hyperparameter-free energy scores to improve the OOD uncertainty estimation [23, 71]. Additionally, researchers also consider using the information contained in gradients to help improve the performance of OOD detection [18].

artificial intelligence, machine learning, ood detection, (15 more...)

Neural Information Processing Systems

Genre: Collection (0.40)

Industry: Social Sector (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Is Out-of-Distribution Detection Learnable? Zhen Fang

Neural Information Processing SystemsFeb-10-2025, 20:16:05 GMT

Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.

artificial intelligence, detection, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Switzerland (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Power of Differentiable Learning versus PAC and SQ Learning Emmanuel Abbe

Neural Information Processing SystemsFeb-10-2025, 18:49:03 GMT

We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision ρ of the gradient calculations relative to the minibatch size b (for SGD) and sample size m (for GD). With fine enough precision relative to minibatch size, namely when bρ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for b = 1. Similarly, with fine enough precision relative to the sample size m, GD can also simulate any sample-based learning algorithm based on m samples. In particular, with polynomially many bits of precision (i.e. when ρ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size.

artificial intelligence, machine learning, query, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.69)

Add feedback

Active Learning of Classifiers with Label and Seed Queries

Neural Information Processing SystemsFeb-10-2025, 09:57:23 GMT

We study exact active learning of binary and multiclass classifiers with margin.

artificial intelligence, machine learning, query, (13 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Supplementary Material A Proof of Theorem 3.1 (Realizable Case - Positive Result) (Lrn|D,) apple c

Neural Information Processing SystemsFeb-10-2025, 09:42:53 GMT

Let H be a hypothesis class with VC dimension d and let 2 (0, 1). Then there exists a learner Lrn having -adversarial risk " To prove Theorem 3.1, we will use the S First, we prove a more general result on the performance of our SPV meta learner. We denote the algorithm obtained by executing SPV with a learner Lrn as the input algorithm by SPV(Lrn). Let H be a concept class, D be a distribution over examples, and Lrn be a learning rule. Let also 2 (0, 1) be the stability parameter given to SPV and let n 1/ be the sample size. By applying linearity of expectation we get " # X To prove Theorem 3.1, we will need an optimal learner as an input learner for SPV.

artificial intelligence, lrn, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.36)

Add feedback

Simplifying Adversarially Robust PAC Learning with Tolerance

Ashtiani, Hassan, Pathak, Vinayak, Urner, Ruth

arXiv.org Artificial IntelligenceFeb-10-2025

Adversarially robust PAC learning has proved to be challenging, with the currently best known learners [Montasser et al., 2021a] relying on improper methods based on intricate compression schemes, resulting in sample complexity exponential in the VC-dimension. A series of follow up work considered a slightly relaxed version of the problem called adversarially robust learning with tolerance [Ashtiani et al., 2023, Bhattacharjee et al., 2023, Raman et al., 2024] and achieved better sample complexity in terms of the VC-dimension. However, those algorithms were either improper and complex, or required additional assumptions on the hypothesis class H. We prove, for the first time, the existence of a simpler learner that achieves a sample complexity linear in the VC-dimension without requiring additional assumptions on H. Even though our learner is improper, it is "almost proper" in the sense that it outputs a hypothesis that is "similar" to a hypothesis in H. We also use the ideas from our algorithm to construct a semi-supervised learner in the tolerant setting. This simple algorithm achieves comparable bounds to the previous (non-tolerant) semi-supervised algorithm of Attias et al. [2022a], but avoids the use of intricate subroutines from previous works, and is "almost proper."

artificial intelligence, learner, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.07232

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback