AITopics | Computational Learning Theory

Collaborating Authors

Computational Learning Theory

In computer science, computational learning theory (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Globally Optimal Learning for Structured Elliptical Losses

Neural Information Processing SystemsOct-9-2024, 15:15:03 GMT

Heavy tailed and contaminated data are common in various applications of machine learning. A standard technique to handle regression tasks that involve such data, is to use robust losses, e.g., the popular Huber's loss. In structured problems, however, where there are multiple labels and structural constraints on the labels are imposed (or learned), robust optimization is challenging, and more often than not the loss used is simply the negative log-likelihood of a Gaussian Markov random field. Heavy tailed and contaminated data are common in various applications of machine learning. A standard technique to handle regression tasks that involve such data, is to use robust losses, e.g., the popular Huber's loss.

globally optimal learning, optimization, structured elliptical loss, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback

Towards Problem-dependent Optimal Learning Rates

Neural Information Processing SystemsOct-9-2024, 15:14:12 GMT

We study problem-dependent rates, i.e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis." Existing uniform convergence and localization frameworks, the most widely used tools to study this problem, often fail to simultaneously provide parameter localization and optimal dependence on the sample size. As a result, existing problem-dependent rates are often rather weak when the hypothesis class is "rich" and the worst-case bound of the loss is large. In this paper we propose a new framework based on a "uniform localized convergence" principle. We provide the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general "rich" classes; we also establish improved loss-dependent rate for standard empirical risk minimization.

problem-dependent optimal learning rate, problem-dependent rate

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback

Minimum Description Length and Generalization Guarantees for Representation Learning

Neural Information Processing SystemsOct-9-2024, 09:24:15 GMT

A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has spurred much interest, most existing such approaches are heuristic; and very little is known about theoretical generalization guarantees. For example, the information bottleneck method seeks a good generalization by finding a minimal description of the input that is maximally informative about the label variable, where minimality and informativeness are both measured by Shannon's mutual information. In this paper, we establish a compressibility framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm in terms of the Minimum Description Length'' (MDL) of the labels or the latent variables (representations). Rather than the mutual information between the encoder's input and the representation, which is often believed to reflect the algorithm's generalization capability in the related literature but in fact, falls short of doing so, our new bounds involve the "multi-letter" relative entropy between the distribution of the representations (or labels) of the training and test sets and a fixed prior.

description length and generalization guarantee, minimum description length, representation, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.59)

Add feedback

Agnostic Multi-Group Active Learning

Neural Information Processing SystemsOct-9-2024, 09:01:34 GMT

Inspired by the problem of improving classification accuracy on rare or hard subsets of a population, there has been recent interest in models of learning where the goal is to generalize to a collection of distributions, each representing a group''. We consider a variant of this problem from the perspective of active learning, where the learner is endowed with the power to decide which examples are labeled from each distribution in the collection, and the goal is to minimize the number of label queries while maintaining PAC-learning guarantees. Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective. We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning, which given a collection of G distributions and a hypothesis class \mathcal{H} with VC-dimension d, outputs an \epsilon -optimal hypothesis using \tilde{O}\left( ( u 2/\epsilon 2) G d \theta_{\mathcal{G}} 2 \log 2(1/\epsilon) G\log(1/\epsilon)/\epsilon 2 \right) label queries, where \theta_{\mathcal{G}} is the worst-case disagreement coefficient over the collection. Roughly speaking, this guarantee improves upon the label complexity of standard multi-group learning in regimes where disagreement-based active learning algorithms may be expected to succeed, and the number of groups is not too large.

agnostic multi-group active learning, label query, mathcal, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.60)

Add feedback

Online Epsilon Net and Piercing Set for Geometric Concepts

Bhore, Sujoy, Dey, Devdan, Singh, Satyam

arXiv.org Artificial IntelligenceOct-9-2024

VC-dimension and $\varepsilon$-nets are key concepts in Statistical Learning Theory. Intuitively, VC-dimension is a measure of the size of a class of sets. The famous $\varepsilon$-net theorem, a fundamental result in Discrete Geometry, asserts that if the VC-dimension of a set system is bounded, then a small sample exists that intersects all sufficiently large sets. In online learning scenarios where data arrives sequentially, the VC-dimension helps to bound the complexity of the set system, and $\varepsilon$-nets ensure the selection of a small representative set. This sampling framework is crucial in various domains, including spatial data analysis, motion planning in dynamic environments, optimization of sensor networks, and feature extraction in computer vision, among others. Motivated by these applications, we study the online $\varepsilon$-net problem for geometric concepts with bounded VC-dimension. While the offline version of this problem has been extensively studied, surprisingly, there are no known theoretical results for the online version to date. We present the first deterministic online algorithm with an optimal competitive ratio for intervals in $\mathbb{R}$. Next, we give a randomized online algorithm with a near-optimal competitive ratio for axis-aligned boxes in $\mathbb{R}^d$, for $d\le 3$. Furthermore, we introduce a novel technique to analyze similar-sized objects of constant description complexity in $\mathbb{R}^d$, which may be of independent interest. Next, we focus on the continuous version of this problem, where ranges of the set system are geometric concepts in $\mathbb{R}^d$ arriving in an online manner, but the universe is the entire space, and the objective is to choose a small sample that intersects all the ranges.

algorithm, competitive ratio, online algorithm, (14 more...)

arXiv.org Artificial Intelligence

2410.07059

Country:

Asia > India > Maharashtra > Mumbai (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Reviews: Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

Neural Information Processing SystemsOct-8-2024, 08:48:59 GMT

This paper nearly settles (effectively settles, for a large range of parameters) an obvious question: " what is the overhead (as a function of k, the number of underlying distributions) in the sample complexity of collaborative PAC learning, compared to vanilla PAC learning?" An easy upper bound is a factor of k. Blum et al. established an upper bound of O(log 2 k) on the overhead factor (for k O(d), where d is the VC dimension of the concept class to learn), and a lower bound of Omega(log k) (for the specific case k d). The main contribution of this paper is to provide an O(log k) upper bound (for k O(d), again; the general upper bound is slightly more complicated for general k) on that ratio; they also generalize the lower bound to hold for any k d {O(1)}. The lower bound is aobtained by generalized and "bootstarting" the lower bound construction of Blum et al., with some changes to handle the base case.

collaborative pac learning, multiplicative weight, tight bound, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Reviews: Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces

Neural Information Processing SystemsOct-7-2024, 21:47:43 GMT

It shows that their algorithm learns using a nearly tight number of samples in the random independent noise of bounded rate. Previous work had exponentially worse dependence on the noise rate. In addition, it shows that this algorithm can deal with adversarial noise of sufficiently low rate. The latter result improves polynomially on the sample complexity but requires a stronger condtion on the noise rate. The assumptions in this setting are very strong and as a result are highly unlikely to hold in any realistic problem.

efficient and label-optimal learning, halfspace, revisiting perceptron, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback

Reviews: PAC-learning in the presence of adversaries

Neural Information Processing SystemsOct-7-2024, 18:47:24 GMT

This paper develops a PAC framework for learning binary functions in the presence of evasion adversaries. Let X be the domain, and suppose we have an unknown function f: X - {0,1} to be learned, and we also have a hypothesis class H. Moreover, there is a "closeness" relationship defined on pairs in X, so any pair of points in X are either "close" or "far." We also have an unknown distribution P on X. For a hypothesis h, its loss is defined as follows: first we pick a data point x in X according to P. The true label of this point is f(x).

adversarial vc-dimension, adversary, vc-dimension, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.79)

Add feedback

Reviews: Collaborative PAC Learning

Neural Information Processing SystemsOct-7-2024, 14:27:21 GMT

I thank the authors for their feedback and clarification and my positive evaluation of this paper remains unchanged. The authors give theoretical guarantees for collaborative learning with shared information under the PAC regime. They show that the sample complexity for (\epsilon,\delta)-PAC learning k classifiers for a shared problem (or a size k majority-voting ensemble) only needs to grow by a factor of approximately 1 log(k) (or 1 log 2(k)) rather than a factor of k as naive theory might predict. The authors provide pseudocode for two algorithms treated by the theory. These were correct as far as I could see, though I didn't implement either.

classifier, collaborative pac learning, mathcal, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.62)

Add feedback

Reviews: On Fast Leverage Score Sampling and Optimal Learning

Neural Information Processing SystemsOct-7-2024, 10:37:17 GMT

After further thought, however, I have updated my review to a 5. While this works seems to have very high potential, the current draft has several important shortcomings: (1) It is not yet polished enough for broad consumption. Significant revisions would be necessary to increase clarity. Summary: This paper presents a novel algorithm for estimating leverage scores for kernel matrices. The running time of this algorithm improves upon state of the art.

algorithm, leverage score, nyström center, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.41)

Add feedback